skypilot-users
Why is my job not running in the cluster even though it's submitted?
I have an echo 'Starting Job' but it never runs. I can find provision.log, sync.log, and setup.log, but nothing is coming from the job. How can I troubleshoot this issue?
Ge
Georgehwp
Asked on Feb 16, 2024
- Check if there are enough resources available on the cluster for the job to run.
- Use
sky queue <cluster_name>
to view the list of jobs and their current status. - If the job is stuck in the queue due to resource unavailability, it will not run until resources become available.
- To remove all jobs from the queue, use
sky cancel <cluster-name> --all
. - If upgrading the skypilot version doesn't resolve the issue, consider restarting the cluster with
sky stop
+sky start
. - If the problem persists, starting from scratch with a new cluster may be a viable solution.
Feb 16, 2024Edited by