Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train not success #51

Open
xiyangyang99 opened this issue Oct 16, 2023 · 2 comments
Open

train not success #51

xiyangyang99 opened this issue Oct 16, 2023 · 2 comments

Comments

@xiyangyang99
Copy link

on 8*RTX 3090 cant train!
this is my train script :
CUDA_VISIBLE_DEVICES=4,5,6,7 python3 -m torch.distributed.launch --master_port=12000 --nnodes 1 --nproc_per_node 4 train.py --config /home/quchunguang/003-large-model/SAM-Adapter-PyTorch/configs/cod-sam-vit-h.yaml --tag exp1

this is train logs
/home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

warnings.warn(
WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


/home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
/home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
/home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
/home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(

and always ........

not any next train output context .................

how can deal with this question?

@tianrun-chen
Copy link
Owner

Greetings! As the current application will utilize over 30G of memory for batchsize=1, we suggest considering alternative graphics cards with greater memory capacity.

@xiyangyang99
Copy link
Author

Greetings! As the current application will utilize over 30G of memory for batchsize=1, we suggest considering alternative graphics cards with greater memory capacity.

Thank you for your reply. I am using 8 * 3090Nvidia and the computer memory is 188Gb. There was no log output during the training process. The graphics card didn't respond either

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants