Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the error that ’ raise RuntimeError( RuntimeError: md traj iter.000002/01.model_devi/task.000.000000 frame 1 with f devi nan does not belong to either accurate, candidiate and failed, it should not happen‘ occur when dpgen run #1460

Open
12jscvb opened this issue Jan 24, 2024 · 6 comments

Comments

@12jscvb
Copy link

12jscvb commented Jan 24, 2024

Summary

In the initial phase of 03.fp in the second loop of dpgen, an error message appears as follows:
INFO:dpgen:-------------------------iter.000002 task 05--------------------------
INFO:dpgen:-------------------------iter.000002 task 06--------------------------
Traceback (most recent call last):
File "/home/combustion/.local/bin/dpgen", line 8, in
sys.exit(main())
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/main.py", line 255, in main
args.func(args)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 5411, in gen_run
run_iter(args.PARAM, args.MACHINE)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 4760, in run_iter
make_fp(ii, jdata, mdata)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 3757, in make_fp
fp_tasks = _make_fp_vasp_configs(iter_index, jdata)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 3341, in _make_fp_vasp_configs
fp_tasks = _make_fp_vasp_inner(
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 2566, in _make_fp_vasp_inner
) = _select_by_model_devi_standard(
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 2330, in _select_by_model_devi_standard
raise RuntimeError(
RuntimeError: md traj iter.000002/01.model_devi/task.000.000000 frame 1 with f devi nan does not belong to either accurate, candidiate and failed, it should not happen

In addition, I added the initial data set before the start of the loop ,then found that nan appeared in the lcurve.out file generated at 00.train and in the model_devi.out file generated at 02.model. I don't know what to do about this situation. Thank you for help. Best wishes
lcurve.txt

model_devi.txt

DP-GEN Version

dpgen v 0.12.0 deepmdv2.2.7

Platform, Python Version, etc

The OS is ubuntu22.04, Before running this loop, I changed the system kernel to use the NVIDIA driver

Details

At the same time, another problem was found. After I installed dpgen, a warning appeared when I checked its version, as follows
/usr/lib/python3/dist-packages/requests/init.py:87: RequestsDependencyWarning: urllib3 (2.1.0) or chardet (4.0.0) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

@njzjz
Copy link
Member

njzjz commented Jan 24, 2024

Please check whether your training data contains NaN.

@12jscvb
Copy link
Author

12jscvb commented Mar 2, 2024

Please check whether your training data contains NaN.
Sorry for my late reply. I have checked the training data and there is no NaN. Could you give me some more suggestions?

@robinzyb
Copy link
Collaborator

robinzyb commented Mar 4, 2024

I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? deepmodeling/deepmd-kit#3242

@12jscvb
Copy link
Author

12jscvb commented Mar 4, 2024

I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? deepmodeling/deepmd-kit#3242

Thanks for your advice, when using deepmd v2.2.8, I occasionally come across this situation, I try DeepMD-Kit v2.2.9. What is the reason for this problem? Thanks

@njzjz
Copy link
Member

njzjz commented Mar 4, 2024

I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? deepmodeling/deepmd-kit#3242

NaN appeared in lcurve.out, so it's a totally different issue. I noticed that before NaN, the energy loss unexpectedly increased. The data should contain outliers.

@12jscvb
Copy link
Author

12jscvb commented Mar 4, 2024

I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? deepmodeling/deepmd-kit#3242

NaN 出现在 lcurve.out 中,所以这是一个完全不同的问题。我注意到,在NaN之前,能量损失出乎意料地增加了。数据应包含异常。

Thank you. I noticed that the data set contains script files. In addition, in addition to this case, I encountered that the potential function training was normal in stage 01.train, but a large number of NaN appeared in the model_devi.out file in stage 02.model_devi, and I found that the folder content corresponding to the ‘remote_root’ parameter in the machine-json file was empty. ( the folder Settings are correct), what is the reason for this? could you give some advice ? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants