skypilot-users
How to debug a skypilot job stuck in 'pending' state?
What steps can be taken to debug a skypilot job that is stuck in the 'pending' state?
Ch
Chaskin Saroff
Asked on Dec 09, 2023
To debug a skypilot job that is stuck in the 'pending' state, you can follow these steps:
- Check the job queue using the command
sky queue jupyter
to see the status of the job. - Check the logs of the job using the command
tail -f ~/sky_logs/sky-2023-12-08-15-56-32-623003/*
to look for any error messages or issues. - Run
ray job list
on the remote cluster to see the current status of the job. - Share the task yaml file with the skypilot team to help them reproduce the issue.
- If the job was cloned from an existing cluster, it could be caused by stalled metadata on the remote cluster. In this case, the skypilot team can investigate and resolve the issue.
It is also recommended to provide any relevant information or error messages when seeking help with debugging a skypilot job.
Dec 19, 2023Edited by