skypilot-users
What are the possible options or optimizations for reducing IO cost when using Trainings with pytorch and skypilot?
At
Attiq ur-Rehman
Asked on Oct 20, 2023
There are several options and optimizations you can try to reduce IO cost when using Trainings with pytorch and skypilot:
- Consider using S3 for storage, which is generally cheaper than EFS. The exact cost depends on your data size and workload's access patterns.
- Use
COPY
mode in Sky Storage when using S3 with skypilot. This will maximize performance and reduce cost by pre-fetching the files to the VM's local disk. - To avoid egress costs when moving data across regions, place your data in the region where you typically launch your jobs. Alternatively, you can use Cloudflare R2, which does not charge any egress fee and is cheaper than S3.
- EBS can also be considered as an option, especially if you are sticking to one region and zone.
- Unfortunately, I'm not familiar with Ray Dataset, so I cannot comment on its cost.
Overall, S3 is the most well-tested and recommended option for reducing IO cost.
Oct 20, 2023Edited by