skypilot-users

How to improve loading speed of torch.load from a mounted S3 bucket using sky-pilot on multiple nodes?

Jason Krone is facing issues with torch.load(checkpoint_path) hanging when loading from a S3 bucket mounted via sky-pilot on multiple nodes. Zongheng Yang suggests using rclone as an alternative to the native MOUNT mode to potentially improve R/W speeds.

Ja

Jason Krone

Asked on Mar 22, 2024

  • Consider using rclone as an alternative to the native MOUNT mode for better R/W speeds.
  • This may improve the loading speed of torch.load from a mounted S3 bucket on multiple nodes.
  • While there's no benchmark available yet, other users have reported potential improvements with rclone.

Example:

rclone mount remote:path /path/to/mountpoint --vfs-cache-mode writes
Mar 24, 2024Edited by