skypilot-users
Is deploying a Kubernetes cluster with 20 nodes a reasonable approach for utilizing 20 single Desktop machines with one 4090 each for Skypilot tasks?
Christian Osendorfer is considering making 20 single Desktop machines with one 4090 each usable for Skypilot by creating a Kubernetes cluster with 20 nodes. He wants to know if this is a reasonable approach and if users can schedule distributed training tasks utilizing 4 GPUs in this scenario.
Ch
Christian Osendorfer
Asked on Apr 09, 2024
- Yes, deploying a Kubernetes cluster with 20 nodes is a reasonable approach for utilizing the 20 single Desktop machines with one 4090 each for Skypilot tasks.
- Users can schedule distributed training tasks utilizing 4 GPUs in this scenario by setting up a functional kubeconfig and using Skypilot's Kubernetes integration.
- An example YAML configuration for launching a distributed training job on 4 machines using 1x 4090 on each node is provided in the discussion thread.
Apr 10, 2024Edited by