Jason Krone is facing issues with torch.load(checkpoint_path)
hanging when loading from a S3 bucket mounted via sky-pilot on multiple nodes. Zongheng Yang suggests using rclone as an alternative to the native MOUNT
mode to potentially improve R/W speeds.
Jason Krone
Asked on Mar 22, 2024
MOUNT
mode for better R/W speeds.torch.load
from a mounted S3 bucket on multiple nodes.Example:
rclone mount remote:path /path/to/mountpoint --vfs-cache-mode writes