You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the error that ’ raise RuntimeError( RuntimeError: md traj iter.000002/01.model_devi/task.000.000000 frame 1 with f devi nan does not belong to either accurate, candidiate and failed, it should not happen‘ occur when dpgen run
#1460
Open
12jscvb opened this issue
Jan 24, 2024
· 6 comments
In the initial phase of 03.fp in the second loop of dpgen, an error message appears as follows:
INFO:dpgen:-------------------------iter.000002 task 05--------------------------
INFO:dpgen:-------------------------iter.000002 task 06--------------------------
Traceback (most recent call last):
File "/home/combustion/.local/bin/dpgen", line 8, in
sys.exit(main())
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/main.py", line 255, in main
args.func(args)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 5411, in gen_run
run_iter(args.PARAM, args.MACHINE)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 4760, in run_iter
make_fp(ii, jdata, mdata)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 3757, in make_fp
fp_tasks = _make_fp_vasp_configs(iter_index, jdata)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 3341, in _make_fp_vasp_configs
fp_tasks = _make_fp_vasp_inner(
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 2566, in _make_fp_vasp_inner
) = _select_by_model_devi_standard(
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 2330, in _select_by_model_devi_standard
raise RuntimeError(
RuntimeError: md traj iter.000002/01.model_devi/task.000.000000 frame 1 with f devi nan does not belong to either accurate, candidiate and failed, it should not happen
In addition, I added the initial data set before the start of the loop ,then found that nan appeared in the lcurve.out file generated at 00.train and in the model_devi.out file generated at 02.model. I don't know what to do about this situation. Thank you for help. Best wishes lcurve.txt
The OS is ubuntu22.04, Before running this loop, I changed the system kernel to use the NVIDIA driver
Details
At the same time, another problem was found. After I installed dpgen, a warning appeared when I checked its version, as follows
/usr/lib/python3/dist-packages/requests/init.py:87: RequestsDependencyWarning: urllib3 (2.1.0) or chardet (4.0.0) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
The text was updated successfully, but these errors were encountered:
Please check whether your training data contains NaN.
Sorry for my late reply. I have checked the training data and there is no NaN. Could you give me some more suggestions?
I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? deepmodeling/deepmd-kit#3242
I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? deepmodeling/deepmd-kit#3242
Thanks for your advice, when using deepmd v2.2.8, I occasionally come across this situation, I try DeepMD-Kit v2.2.9. What is the reason for this problem? Thanks
I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? deepmodeling/deepmd-kit#3242
NaN appeared in lcurve.out, so it's a totally different issue. I noticed that before NaN, the energy loss unexpectedly increased. The data should contain outliers.
I encountered this problem before. v2.2.7 sometimes writes nan in model deviation. Could you try deepmd with newest version like v2.2.9? deepmodeling/deepmd-kit#3242
NaN 出现在 lcurve.out 中,所以这是一个完全不同的问题。我注意到,在NaN之前,能量损失出乎意料地增加了。数据应包含异常。
Thank you. I noticed that the data set contains script files. In addition, in addition to this case, I encountered that the potential function training was normal in stage 01.train, but a large number of NaN appeared in the model_devi.out file in stage 02.model_devi, and I found that the folder content corresponding to the ‘remote_root’ parameter in the machine-json file was empty. ( the folder Settings are correct), what is the reason for this? could you give some advice ? Thanks
Summary
In the initial phase of 03.fp in the second loop of dpgen, an error message appears as follows:
INFO:dpgen:-------------------------iter.000002 task 05--------------------------
INFO:dpgen:-------------------------iter.000002 task 06--------------------------
Traceback (most recent call last):
File "/home/combustion/.local/bin/dpgen", line 8, in
sys.exit(main())
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/main.py", line 255, in main
args.func(args)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 5411, in gen_run
run_iter(args.PARAM, args.MACHINE)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 4760, in run_iter
make_fp(ii, jdata, mdata)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 3757, in make_fp
fp_tasks = _make_fp_vasp_configs(iter_index, jdata)
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 3341, in _make_fp_vasp_configs
fp_tasks = _make_fp_vasp_inner(
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 2566, in _make_fp_vasp_inner
) = _select_by_model_devi_standard(
File "/home/combustion/.local/lib/python3.10/site-packages/dpgen/generator/run.py", line 2330, in _select_by_model_devi_standard
raise RuntimeError(
RuntimeError: md traj iter.000002/01.model_devi/task.000.000000 frame 1 with f devi nan does not belong to either accurate, candidiate and failed, it should not happen
In addition, I added the initial data set before the start of the loop ,then found that nan appeared in the lcurve.out file generated at 00.train and in the model_devi.out file generated at 02.model. I don't know what to do about this situation. Thank you for help. Best wishes
lcurve.txt
model_devi.txt
DP-GEN Version
dpgen v 0.12.0 deepmdv2.2.7
Platform, Python Version, etc
The OS is ubuntu22.04, Before running this loop, I changed the system kernel to use the NVIDIA driver
Details
At the same time, another problem was found. After I installed dpgen, a warning appeared when I checked its version, as follows
/usr/lib/python3/dist-packages/requests/init.py:87: RequestsDependencyWarning: urllib3 (2.1.0) or chardet (4.0.0) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
The text was updated successfully, but these errors were encountered: