skypilot-users
Why use torch.distributed.launch instead of torchrun?
Georgehwp is asking about the reasons for using torch.distributed.launch over torchrun in a specific context related to mmdetection. Zongheng Yang confirms that they are not aware of any specific reasons and asks for feedback if torchrun works. Georgehwp mentions that torchrun is easier to use and better documented. Georgehwp plans to test the interchangeability of torchrun and torch.distributed.launch.
Ge
Georgehwp
Asked on Feb 22, 2024
- torch.distributed.launch and torchrun are both used for distributed training in PyTorch.
- torch.distributed.launch is commonly used for launching distributed training jobs.
- torchrun is mentioned as an alternative that Georgehwp found easier to use and better documented.
- Georgehwp suggests that torchrun and python3 -m torch.distributed.launch may be interchangeable in this case, but further testing is needed.
- Georgehwp has not tested the setup for multi-node yet.
Feb 22, 2024Edited by