skypilot-users

Is the networking on Azure NC A100 instances sufficient for good performance in training multi-node jobs?

Al

Alex Kouzemtchenko

Asked on Nov 15, 2023

According to someone who used the non Infiniband networking on Azure, they experienced sublinear scaling during training with a 25GB model. It is expected to be worse with more nodes and larger models. However, if you have quota for your credits, it may still be worth trying.

Dec 19, 2023Edited by