skypilot-users

Is it possible to run any quantized model on k8s without gpu?

I am new to this and wondering if it's possible to run quantized models on a Kubernetes cluster without GPU support.

Kr

Krish C

Asked on Apr 01, 2024

Yes, you can run quantized models on a CPU-only Kubernetes cluster. SkyPilot provides the capability to run quantized models on CPU clusters. An example is provided where a quantized llama2 model is launched on a CPU cluster. The specific model can be specified using environment variables. For example, you can launch the llama2 model using the following command:

sky launch ollama.yaml -c ollama --env MODEL_NAME=llama2:70b

Apr 02, 2024Edited by