I am testing Skypilot to run my ML Training using OnSpot instances. When we have a pre-emption training restarts automatically, but the ending model we get has very different evaluation score much worse than the training without pre-emptions. Any suggestions on what could be the issue?
Bilal Yousaf
Asked on Feb 16, 2024