I'm running into this error after having the controller be up for ~21 hours. I can ssh into the machine and I also see all sorts of errors in the logs from gcp, though nothing obvious. Does anyone know what this could be? Also, is there any way to restart the controller when it's in a bad state?
Alex Kouzemtchenko
Asked on Oct 27, 2023
To get the spot controller out of abnormal state, you can try sky start -f sky-spot-controller-<hash>
.