skypilot-users

How to debug a skypilot job stuck in 'pending' state?

What steps can be taken to debug a skypilot job that is stuck in the 'pending' state?

Ch

Chaskin Saroff

Asked on Dec 09, 2023

To debug a skypilot job that is stuck in the 'pending' state, you can follow these steps:

  1. Check the job queue using the command sky queue jupyter to see the status of the job.
  2. Check the logs of the job using the command tail -f ~/sky_logs/sky-2023-12-08-15-56-32-623003/* to look for any error messages or issues.
  3. Run ray job list on the remote cluster to see the current status of the job.
  4. Share the task yaml file with the skypilot team to help them reproduce the issue.
  5. If the job was cloned from an existing cluster, it could be caused by stalled metadata on the remote cluster. In this case, the skypilot team can investigate and resolve the issue.

It is also recommended to provide any relevant information or error messages when seeking help with debugging a skypilot job.

Dec 19, 2023Edited by