I am new to this and wondering if it's possible to run quantized models on a Kubernetes cluster without GPU support.
Krish C
Asked on Apr 01, 2024
Yes, you can run quantized models on a CPU-only Kubernetes cluster. SkyPilot provides the capability to run quantized models on CPU clusters. An example is provided where a quantized llama2 model is launched on a CPU cluster. The specific model can be specified using environment variables. For example, you can launch the llama2 model using the following command:
sky launch ollama.yaml -c ollama --env MODEL_NAME=llama2:70b