skypilot-users

Skypilot stuck in infinite loop when reusing cluster on AWS

Ha

Hamza Tahir

Asked on Oct 26, 2023

The issue you're experiencing with Skypilot getting stuck in an infinite loop when reusing a cluster on AWS is likely due to a lack of permission for the ec2:DescribeInstances action. This error occurs when the ray cluster on the remote VM encounters an unauthorized operation. To fix this, you need to ensure that your AWS account has the necessary permissions. You can check your IAM user's permissions and make sure it has the ec2:DescribeInstances permission. Additionally, you can try using the latest skypilot-nightly version, which includes a new provisioner that may resolve the issue. If you're using a different IAM user or profile on your client, you'll need to ensure that the VM has access to the correct AWS credentials. You can upload your local static AWS credentials to the VM and let the service on the VM use those credentials. If you're using an AWS profile, you may need to set the AWS_PROFILE environment variable in the VM through Skypilot configuration. Please refer to the Skypilot documentation for more details.

Oct 27, 2023Edited by