-
How to use mpi to train distributed? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
You can set the launcher equal to mpi, and use mpirun to launch the code. |
Beta Was this translation helpful? Give feedback.
-
@ZwwWayne , hi, I was also wondering how to use MPI for multi-machine training. Could you give an example here? |
Beta Was this translation helpful? Give feedback.
-
An example for use MPI run for distributed training with 2 GPUS on 2 nodes (1 GPU per node). mpirun \
--allow-run-as-root \
--npernode 1 --np 2 \
python tools/train.py ${CONFIG_FILE} --launcher mpi Note: Should at least set |
Beta Was this translation helpful? Give feedback.
-
@yingfhu Great, thanks for the help. |
Beta Was this translation helpful? Give feedback.
An example for use MPI run for distributed training with 2 GPUS on 2 nodes (1 GPU per node).
mpirun \ --allow-run-as-root \ --npernode 1 --np 2 \ python tools/train.py ${CONFIG_FILE} --launcher mpi
Note: Should at least set
MASTER_ADDR
environment variable which is necessary for pytorch. Refers tohttps://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/dist_utils.py#L66