Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About running dpgen autotest on WSL ubuntu. Error occured #1549

Open
NKJunhongLi opened this issue May 14, 2024 · 0 comments
Open

About running dpgen autotest on WSL ubuntu. Error occured #1549

NKJunhongLi opened this issue May 14, 2024 · 0 comments

Comments

@NKJunhongLi
Copy link

Summary

I successfully trained a model using deepmd-kit. Now I want to run dpgen autotest for calculating physical properties.

I follow the dpgen document, and have prepared relaxation.json and machine_local.json.

I run
dpgen autotest make relaxation_T.json
It successfully works.

Then I run
dpgen autotest run relaxation_T.json machine_local.json
It comes out an error.

It seems that dpgen is trying to submit jobs, but I am running it on my local shell. I think that there should not be job submissions.
The error calls "unexpected submission state".

I put my json files here.
machine_local.json
relaxation_T.json

I would like to know if there is any mistakes in the json files, and how can I solve it.

DP-GEN Version

0.12.1

Platform, Python Version, etc

Platform: WSL Ubuntu 22.04
Python version: 3.10.13

Details

DeepModeling

Version: 0.12.1
Path: /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgen

Dependency

 numpy     1.26.4   /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/numpy
dpdata     0.2.18   /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdata

pymatgen unknown version or path
monty 2024.4.17 /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/monty
ase 3.22.1 /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/ase
paramiko 3.4.0 /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/paramiko
custodian 2024.4.18 /home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/custodian

Reference

Please cite:
Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E,
DP-GEN: A concurrent learning platform for the generation of reliable deep learning
based potential energy models, Computer Physics Communications, 2020, 107206.

Description

/home/lijh/HfO2/4phase-200w/autotest --> Runing...
2024-05-14 15:53:04,453 - INFO : info:check_all_finished: False
2024-05-14 15:53:04,457 - INFO : job: b910e4a6be4620f8b89f5ed1af23cab264b0e786 submit; job_id is 31369
2024-05-14 15:53:35,592 - INFO : job: b910e4a6be4620f8b89f5ed1af23cab264b0e786 31369 terminated; fail_cout is 1; resubmitting job
2024-05-14 15:53:35,642 - INFO : job:b910e4a6be4620f8b89f5ed1af23cab264b0e786 re-submit after terminated; new job_id is 31708
2024-05-14 15:53:35,851 - INFO : job:b910e4a6be4620f8b89f5ed1af23cab264b0e786 job_id:31708 after re-submitting; the state now is <JobStatus.running: 3>
2024-05-14 15:54:05,986 - INFO : job: b910e4a6be4620f8b89f5ed1af23cab264b0e786 31708 terminated; fail_cout is 2; resubmitting job
2024-05-14 15:54:06,029 - INFO : job:b910e4a6be4620f8b89f5ed1af23cab264b0e786 re-submit after terminated; new job_id is 32098
2024-05-14 15:54:06,238 - INFO : job:b910e4a6be4620f8b89f5ed1af23cab264b0e786 job_id:32098 after re-submitting; the state now is <JobStatus.running: 3>
2024-05-14 15:54:36,367 - INFO : job: b910e4a6be4620f8b89f5ed1af23cab264b0e786 32098 terminated; fail_cout is 3; resubmitting job
Traceback (most recent call last):
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdispatcher/submission.py", line 358, in handle_unexpected_submission_state
job.handle_unexpected_job_state()
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdispatcher/submission.py", line 862, in handle_unexpected_job_state
raise RuntimeError(err_msg)
RuntimeError: job:b910e4a6be4620f8b89f5ed1af23cab264b0e786 32098 failed 3 times.
Possible remote error message: ==> /home/lijh/HfO2/4phase-200w/autotest/work/b279de7e9ede8ee9b4d5502ff7df4cc95cbe3866/confs/T_phase/relaxation/relax_task/errlog <==

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/lijh/anaconda3/envs/deepmd/bin/dpgen", line 8, in
sys.exit(main())
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgen/main.py", line 255, in main
args.func(args)
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgen/auto_test/run.py", line 58, in gen_test
run_task(args.TASK, args.PARAM, args.MACHINE)
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgen/auto_test/run.py", line 34, in run_task
run_equi(confs, inter_parameter, mdata)
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpgen/auto_test/common_equi.py", line 197, in run_equi
submission.run_submission()
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdispatcher/submission.py", line 261, in run_submission
self.handle_unexpected_submission_state()
File "/home/lijh/anaconda3/envs/deepmd/lib/python3.10/site-packages/dpdispatcher/submission.py", line 362, in handle_unexpected_submission_state
raise RuntimeError(
RuntimeError: Meet errors will handle unexpected submission state.
Debug information: remote_root==/home/lijh/HfO2/4phase-200w/autotest/work/b279de7e9ede8ee9b4d5502ff7df4cc95cbe3866.
Debug information: submission_hash==b279de7e9ede8ee9b4d5502ff7df4cc95cbe3866.
Please check error messages above and in remote_root. The submission information is saved in /home/lijh/.dpdispatcher/submission/b279de7e9ede8ee9b4d5502ff7df4cc95cbe3866.json.
For furthur actions, run the following command with proper flags: dpdisp submission b279de7e9ede8ee9b4d5502ff7df4cc95cbe3866

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant