I am using an fp16 7B model which is ~13GB and syncing it from the local filesystem takes forever (understandably). Mount from S3? Something else?
Conner Swann
Asked on Nov 06, 2023
If you need to share a fixed model across multiple machines, you can store the checkpoint in a cloud storage service like S3, r2, GCS, or IBM COS. You can use the COPY mode to fetch the checkpoint from the storage to the machine. If speed is a concern, you can use the crt
transfer client for syncing from the local file system to S3. You can set the preferred transfer client to crt
in the AWS config or use the aws configure set default.s3.preferred_transfer_client crt
command.