Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for spin, DeltaSpin and LAMMPS SPIN #728

Open
wants to merge 14 commits into
base: devel
Choose a base branch
from

Conversation

yangtengleo
Copy link

@yangtengleo yangtengleo commented Sep 20, 2024

  1. add new data format "spin" and "mag_forces" in LabeledSystem.
  2. add data format conversion from DeltaSpin to LabeledSystem.
  3. add data format conversion from LAMMPS SPIN dump to LabeledSystem.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for loading and handling spin data across various file formats including LAMMPS and VASP.
    • Introduced classes and methods for reading and writing VASP data formats, specifically tailored for delta spin configurations.
  • Enhancements

    • Enhanced data extraction from LAMMPS dump files to include spin forces.
    • Improved system representation to accommodate new data types for spins and magnetic forces.
  • Bug Fixes

    • Fixed issues related to the handling of spin data in existing functions, ensuring accurate data processing and representation.

Copy link

coderabbitai bot commented Sep 20, 2024

Walkthrough

Walkthrough

The changes introduce functionality for loading and handling spin data across multiple files in the dpdata library. Key updates include modifications to the _load_set and to_system_data functions to incorporate spins, new functions for extracting spin data from LAMMPS dump files, and enhancements to VASP file handling for delta spin configurations. Additionally, new data types for spins and magnetic forces are defined in the System and LabeledSystem classes, improving the representation of spin-related information.

Changes

Files Change Summary
dpdata/deepmd/comp.py, dpdata/deepmd/raw.py Updated _load_set and to_system_data functions to include loading and handling of spin data. Added "spins" to the attributes processed in to_system_data.
dpdata/lammps/dump.py, dpdata/lammps/lmp.py Introduced functions to extract spin type and spin forces from LAMMPS dump files. Updated system_data to include spin data in the output. Added a new function to retrieve spins from atom lines.
dpdata/plugins/vasp_deltaspin.py Created classes for handling VASP data formats, specifically for delta spin configurations, including methods for reading and writing POSCAR and OUTCAR files while managing spins.
dpdata/system.py Added new data types for spins and magnetic forces in the System and LabeledSystem classes, enhancing data handling capabilities.
dpdata/vasp_deltaspin/outcar.py, dpdata/vasp_deltaspin/poscar.py Introduced functions for parsing and analyzing VASP OUTCAR and POSCAR files, including handling of spins and magnetic moments, facilitating the extraction and writing of relevant simulation data.

Possibly related PRs

Suggested reviewers

  • njzjz
  • wanghan-iapcm

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codspeed-hq bot commented Sep 20, 2024

CodSpeed Performance Report

Merging #728 will not alter performance

Comparing yangtengleo:devel (4339232) with devel (c1d6c73)

Summary

✅ 2 untouched benchmarks

Copy link

codecov bot commented Sep 20, 2024

Codecov Report

Attention: Patch coverage is 17.17452% with 299 lines in your changes missing coverage. Please review.

Project coverage is 81.56%. Comparing base (c1d6c73) to head (4339232).

Files with missing lines Patch % Lines
dpdata/vasp_deltaspin/outcar.py 5.44% 139 Missing ⚠️
dpdata/vasp_deltaspin/poscar.py 5.20% 91 Missing ⚠️
dpdata/plugins/vasp_deltaspin.py 40.81% 29 Missing ⚠️
dpdata/lammps/dump.py 20.58% 27 Missing ⚠️
dpdata/lammps/lmp.py 43.47% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #728      +/-   ##
==========================================
- Coverage   84.73%   81.56%   -3.18%     
==========================================
  Files          81       84       +3     
  Lines        7345     7702     +357     
==========================================
+ Hits         6224     6282      +58     
- Misses       1121     1420     +299     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 25

Outside diff range and nitpick comments (5)
dpdata/plugins/vasp_deltaspin.py (2)

75-77: Add a docstring to 'from_labeled_system' method

Including a docstring for the from_labeled_system method enhances code readability and maintainability by documenting its purpose, parameters, and return values.

Consider adding a docstring similar to:

    def from_labeled_system(
        self, file_name, begin=0, step=1, convergence_check=True, **kwargs
    ):
        """
        Load data from a VASP OUTCAR file into a labeled system.

        Parameters
        ----------
        file_name : str
            The path to the OUTCAR file.
        begin : int, optional
            The starting frame index, by default 0.
        step : int, optional
            The step size between frames, by default 1.
        convergence_check : bool, optional
            Whether to perform convergence checks, by default True.
        **kwargs : dict
            Additional keyword arguments.

        Returns
        -------
        dict
            The loaded system data.
        """

18-24: Increase test coverage for critical code paths

Several critical sections of the code, including file operations and data processing, are not covered by unit tests. This may lead to undiscovered bugs and reduce code reliability.

Consider adding unit tests for these code paths to improve test coverage and ensure the code behaves as expected. Do you need help in creating these tests or setting up the testing framework?

Also applies to: 40-42, 44-46, 67-68, 78-80, 98-99, 101-107

Tools
GitHub Check: codecov/patch

[warning] 18-24: dpdata/plugins/vasp_deltaspin.py#L18-L24
Added lines #L18 - L24 were not covered by tests

dpdata/vasp_deltaspin/poscar.py (1)

59-61: Correct grammatical error in exception message

In the exception message, 'seem' should be 'seems' for grammatical correctness.

Apply this diff to correct the message:

-    raise RuntimeError(
-        "seem not to be a valid POSCAR of vasp 5.x, may be a POSCAR of vasp 4.x?"
-    )
+    raise RuntimeError(
+        "seems not to be a valid POSCAR of VASP 5.x, may be a POSCAR of VASP 4.x?"
+    )
dpdata/vasp_deltaspin/outcar.py (1)

119-122: Simplify conditional assignment using a ternary operator.

You can replace the if-else block with a single-line conditional expression for brevity.

Apply this diff:

-    if len(all_virials) == 0:
-        all_virials = None
-    else:
-        all_virials = np.array(all_virials)
+    all_virials = None if len(all_virials) == 0 else np.array(all_virials)
Tools
Ruff

119-122: Use ternary operator all_virials = None if len(all_virials) == 0 else np.array(all_virials) instead of if-else-block

Replace if-else-block with all_virials = None if len(all_virials) == 0 else np.array(all_virials)

(SIM108)

dpdata/lammps/lmp.py (1)

130-141: Add unit tests for new spin-handling functionality

The newly added get_spins function and the spin processing in system_data are not covered by tests, as indicated by the static analysis. Adding unit tests will help ensure correctness and prevent future regressions.

Would you like assistance in creating these unit tests or opening a new GitHub issue to track this task?

Also applies to: 167-169

Tools
GitHub Check: codecov/patch

[warning] 134-140: dpdata/lammps/lmp.py#L134-L140
Added lines #L134 - L140 were not covered by tests

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between c1d6c73 and 4339232.

Files selected for processing (8)
  • dpdata/deepmd/comp.py (3 hunks)
  • dpdata/deepmd/raw.py (1 hunks)
  • dpdata/lammps/dump.py (3 hunks)
  • dpdata/lammps/lmp.py (3 hunks)
  • dpdata/plugins/vasp_deltaspin.py (1 hunks)
  • dpdata/system.py (4 hunks)
  • dpdata/vasp_deltaspin/outcar.py (1 hunks)
  • dpdata/vasp_deltaspin/poscar.py (1 hunks)
Additional context used
GitHub Check: codecov/patch
dpdata/lammps/dump.py

[warning] 136-141: dpdata/lammps/dump.py#L136-L141
Added lines #L136 - L141 were not covered by tests


[warning] 145-157: dpdata/lammps/dump.py#L145-L157
Added lines #L145 - L157 were not covered by tests


[warning] 166-168: dpdata/lammps/dump.py#L166-L168
Added lines #L166 - L168 were not covered by tests


[warning] 258-260: dpdata/lammps/dump.py#L258-L260
Added lines #L258 - L260 were not covered by tests


[warning] 274-274: dpdata/lammps/dump.py#L274
Added line #L274 was not covered by tests


[warning] 279-279: dpdata/lammps/dump.py#L279
Added line #L279 was not covered by tests

dpdata/lammps/lmp.py

[warning] 134-140: dpdata/lammps/lmp.py#L134-L140
Added lines #L134 - L140 were not covered by tests


[warning] 169-169: dpdata/lammps/lmp.py#L169
Added line #L169 was not covered by tests


[warning] 236-236: dpdata/lammps/lmp.py#L236
Added line #L236 was not covered by tests


[warning] 248-248: dpdata/lammps/lmp.py#L248
Added line #L248 was not covered by tests


[warning] 251-252: dpdata/lammps/lmp.py#L251-L252
Added lines #L251 - L252 were not covered by tests


[warning] 264-264: dpdata/lammps/lmp.py#L264
Added line #L264 was not covered by tests

dpdata/plugins/vasp_deltaspin.py

[warning] 18-24: dpdata/plugins/vasp_deltaspin.py#L18-L24
Added lines #L18 - L24 were not covered by tests


[warning] 40-42: dpdata/plugins/vasp_deltaspin.py#L40-L42
Added lines #L40 - L42 were not covered by tests


[warning] 44-46: dpdata/plugins/vasp_deltaspin.py#L44-L46
Added lines #L44 - L46 were not covered by tests


[warning] 49-50: dpdata/plugins/vasp_deltaspin.py#L49-L50
Added lines #L49 - L50 were not covered by tests


[warning] 67-68: dpdata/plugins/vasp_deltaspin.py#L67-L68
Added lines #L67 - L68 were not covered by tests


[warning] 78-80: dpdata/plugins/vasp_deltaspin.py#L78-L80
Added lines #L78 - L80 were not covered by tests


[warning] 98-99: dpdata/plugins/vasp_deltaspin.py#L98-L99
Added lines #L98 - L99 were not covered by tests


[warning] 101-107: dpdata/plugins/vasp_deltaspin.py#L101-L107
Added lines #L101 - L107 were not covered by tests

dpdata/vasp_deltaspin/outcar.py

[warning] 10-15: dpdata/vasp_deltaspin/outcar.py#L10-L15
Added lines #L10 - L15 were not covered by tests


[warning] 17-18: dpdata/vasp_deltaspin/outcar.py#L17-L18
Added lines #L17 - L18 were not covered by tests


[warning] 20-20: dpdata/vasp_deltaspin/outcar.py#L20
Added line #L20 was not covered by tests


[warning] 22-22: dpdata/vasp_deltaspin/outcar.py#L22
Added line #L22 was not covered by tests


[warning] 24-31: dpdata/vasp_deltaspin/outcar.py#L24-L31
Added lines #L24 - L31 were not covered by tests


[warning] 33-41: dpdata/vasp_deltaspin/outcar.py#L33-L41
Added lines #L33 - L41 were not covered by tests


[warning] 43-44: dpdata/vasp_deltaspin/outcar.py#L43-L44
Added lines #L43 - L44 were not covered by tests


[warning] 48-57: dpdata/vasp_deltaspin/outcar.py#L48-L57
Added lines #L48 - L57 were not covered by tests


[warning] 62-63: dpdata/vasp_deltaspin/outcar.py#L62-L63
Added lines #L62 - L63 were not covered by tests


[warning] 65-66: dpdata/vasp_deltaspin/outcar.py#L65-L66
Added lines #L65 - L66 were not covered by tests

Ruff
dpdata/lammps/lmp.py

235-235: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


250-250: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

dpdata/plugins/vasp_deltaspin.py

46-48: re.sub should pass count and flags as keyword arguments to avoid confusion due to unintuitive argument positions

(B034)

dpdata/system.py

716-719: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)


718-718: Do not use bare except

(E722)

dpdata/vasp_deltaspin/outcar.py

14-14: Local variable ii_word_list is assigned to but never used

Remove assignment to unused variable ii_word_list

(F841)


39-39: Loop control variable jj not used within loop body

Rename unused jj to _jj

(B007)


62-62: Use a context manager for opening files

(SIM115)


115-115: No explicit stacklevel keyword argument found

(B028)


119-122: Use ternary operator all_virials = None if len(all_virials) == 0 else np.array(all_virials) instead of if-else-block

Replace if-else-block with all_virials = None if len(all_virials) == 0 else np.array(all_virials)

(SIM108)


176-176: Use lines[idx + in_kB_index].split()[0:2] != ["in", "kB"] instead of not lines[idx + in_kB_index].split()[0:2] == ["in", "kB"]

Replace with != operator

(SIM201)

dpdata/vasp_deltaspin/poscar.py

32-32: Loop control variable jj not used within loop body

Rename unused jj to _jj

(B007)


48-48: Local variable spins_idx is assigned to but never used

Remove assignment to unused variable spins_idx

(F841)


57-58: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

Additional comments not posted (9)
dpdata/deepmd/raw.py (1)

67-67: LGTM! Verify the usage of the "spins" attribute.

The addition of the "spins" attribute to the list of handled data elements in the to_system_data function looks good. It enhances the function's capability to process spin-related data during the conversion from deepmd/raw format to system data format.

However, please ensure that the rest of the codebase is equipped to handle the "spins" attribute correctly, as this is a new addition.

Run the following script to verify the usage of the "spins" attribute:

Verification successful

Verified: "spins" attribute is consistently used throughout the codebase.

The addition of the "spins" attribute to the list in the to_system_data function is consistent with its usage throughout the codebase. The attribute is well-integrated into various parts of the project, including:

  • Test files (e.g., test_abacus_spin.py)
  • VASP-related files (e.g., dpdata/vasp_deltaspin/outcar.py)
  • LAMMPS-related files (e.g., dpdata/lammps/lmp.py)
  • ABACUS-related files (e.g., dpdata/abacus/scf.py)

The widespread use of the "spins" attribute in data manipulation, testing, and file I/O operations indicates that the codebase is equipped to handle this attribute correctly. The change enhances the functionality of the to_system_data function by ensuring it can process spin-related data, which is consistent with the project's capabilities.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of the "spins" attribute in the codebase.

# Test: Search for the usage of the "spins" attribute. Expect: Relevant code snippets handling the "spins" attribute.
rg --type python -A 5 $'spins'

Length of output: 60


Script:

#!/bin/bash
# Description: Verify the usage of the "spins" attribute in the codebase using alternative methods.

# Method 1: Use rg without file type specification
echo "Method 1: Using rg without file type specification"
rg -A 5 'spins'

# Method 2: Use fd to find Python files and xargs with rg to search them
echo "Method 2: Using fd and xargs with rg"
fd -e py | xargs rg -A 5 'spins'

# Method 3: Use ast-grep to search for the "spins" attribute in Python files
echo "Method 3: Using ast-grep"
ast-grep --lang python --pattern '$_ = $_.spins'

Length of output: 33142

dpdata/system.py (1)

98-98: LGTM!

The new optional data type "spins" is correctly defined and enhances the data handling capabilities of the System class.

dpdata/plugins/vasp_deltaspin.py (1)

101-106: Verify 'virials' scaling calculation for correctness

The code scales the virials to the unit of eV using a conversion factor and volume calculations. It's important to ensure that the calculation is correct and that potential numerical issues (e.g., division by zero, incorrect array shapes) are handled.

Please review the scaling factor and the volume calculation to confirm their correctness. Additionally, consider adding checks for zero volume or invalid cell shapes to prevent runtime errors.

Tools
GitHub Check: codecov/patch

[warning] 101-107: dpdata/plugins/vasp_deltaspin.py#L101-L107
Added lines #L101 - L107 were not covered by tests

dpdata/deepmd/comp.py (6)

42-42: Initialization of all_spins list is appropriate.

The all_spins list is correctly initialized to collect spins data across datasets.


44-44: Correctly updated call to _load_set with additional return value.

The function call now properly unpacks cells, coords, and spins, aligning with the updated _load_set function.


48-49: Proper handling and reshaping of spins data.

The code appropriately checks if spins is not None and reshapes it similarly to how coords are handled. This ensures the spins data is correctly aggregated.


52-53: Conditional addition of spins to the data dictionary is appropriate.

By checking if all_spins has any data before adding to data, we ensure that spins is only included when available, preventing potential KeyErrors or empty data entries.


68-68: Appropriate exclusion of spins from dtype processing loop.

Adding "spins" to the skip list ensures that spins data is not processed in the generic loop, avoiding unintended behavior since spins requires specific handling.


29-30: Verify handling of the additional return value from _load_set.

The _load_set function now returns spins along with cells and coords. Please ensure that all calls to _load_set are updated to handle this new return value, and that backward compatibility is maintained.

Run the following script to verify all usages of _load_set are correctly updated:

Verification successful

Changes to _load_set have been correctly implemented and used.

The verification process confirms that the _load_set function is being called with three assigned variables (cells, coords, spins) as expected. There are no instances of outdated usage with only two return values.

For improved clarity, consider updating the function definition to explicitly document the return values:

def _load_set(folder, nopbc: bool) -> Tuple[np.ndarray, np.ndarray, Optional[np.ndarray]]:
    # ... existing implementation ...

This change would make the function's interface more self-documenting and help prevent future misuse.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all calls to `_load_set` and check the number of assigned variables.

# Search for assignments from `_load_set` with three return variables
rg --type python -P '(\w+),\s*(\w+),\s*(\w+)\s*=\s*_load_set\('

# Search for assignments from `_load_set` with two return variables (which may need updating)
rg --type python -P '(\w+),\s*(\w+)\s*=\s*_load_set\('

Length of output: 184


Script:

#!/bin/bash
# Description: Find all calls to `_load_set` and its definition, with context

# Search for the definition of _load_set
echo "Definition of _load_set:"
rg -n -C 2 "def _load_set\("

echo "\nCalls to _load_set:"
# Search for assignments from _load_set with any number of return variables
rg -n -C 2 "\w+(?:,\s*\w+)*\s*=\s*_load_set\("

# Search for any other usage of _load_set
echo "\nOther usages of _load_set:"
rg -n -C 2 "_load_set\(" | grep -v "def _load_set"

Length of output: 1425

Comment on lines +716 to +719
try:
self.data["spins"][f_idx] = np.matmul(self.data["spins"][f_idx], trans)
except:
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use contextlib.suppress(Exception) and avoid bare except.

Consider the following improvements:

  1. Replace the try-except-pass pattern with contextlib.suppress(Exception) for a more explicit and concise way to handle the specific exception.
  2. Avoid using a bare except as it can catch unintended exceptions and make debugging difficult. Specify the expected exception type instead.

Apply this diff to address the suggestions:

-        try:
-            self.data["spins"][f_idx] = np.matmul(self.data["spins"][f_idx], trans)
-        except:
-            pass
+        with contextlib.suppress(KeyError):
+            self.data["spins"][f_idx] = np.matmul(self.data["spins"][f_idx], trans)
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
self.data["spins"][f_idx] = np.matmul(self.data["spins"][f_idx], trans)
except:
pass
with contextlib.suppress(KeyError):
self.data["spins"][f_idx] = np.matmul(self.data["spins"][f_idx], trans)
Tools
Ruff

716-719: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)


718-718: Do not use bare except

(E722)

Comment on lines +46 to +48
res_incar = re.sub(
r"MAGMOM[\s\S]*?\n\nM_CONST[\s\S]*?\n\n", m_str, tmp_incar, re.S
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use 'flags' as a keyword argument in 're.sub'

To improve code readability and avoid confusion with positional arguments, pass the flags parameter as a keyword argument in the re.sub function.

Apply this diff to address the concern:

         res_incar = re.sub(
             r"MAGMOM[\s\S]*?\n\nM_CONST[\s\S]*?\n\n", m_str, tmp_incar, flags=re.S
         )
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
res_incar = re.sub(
r"MAGMOM[\s\S]*?\n\nM_CONST[\s\S]*?\n\n", m_str, tmp_incar, re.S
)
res_incar = re.sub(
r"MAGMOM[\s\S]*?\n\nM_CONST[\s\S]*?\n\n", m_str, tmp_incar, flags=re.S
)
Tools
Ruff

46-48: re.sub should pass count and flags as keyword arguments to avoid confusion due to unintuitive argument positions

(B034)

**kwargs : dict
other parameters
"""
assert frame_idx < len(data["coords"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace 'assert' with exception handling for input validation

Using assert statements for input validation is not recommended in production code because assertions can be disabled with optimization flags (-O), potentially skipping critical checks.

Replace the assert statement with explicit exception handling to ensure the validation is always performed.

-        assert frame_idx < len(data["coords"])
+        if frame_idx >= len(data["coords"]):
+            raise IndexError(f"frame_idx {frame_idx} is out of range.")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert frame_idx < len(data["coords"])
if frame_idx >= len(data["coords"]):
raise IndexError(f"frame_idx {frame_idx} is out of range.")
Tools
GitHub Check: codecov/patch

[warning] 67-68: dpdata/plugins/vasp_deltaspin.py#L67-L68
Added lines #L67 - L68 were not covered by tests

Comment on lines +44 to +46
with open(file_name[:-6] + "INCAR") as fp:
tmp_incar = fp.read()
res_incar = re.sub(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling when reading 'INCAR' file

When opening file_name[:-6] + "INCAR", there is no check to handle situations where the file might not exist or be inaccessible, which could result in an unhandled exception.

Include error handling to gracefully handle potential file access issues.

+        import os
+        incar_path = file_name[:-6] + "INCAR"
+        if not os.path.exists(incar_path):
+            raise FileNotFoundError(f"File not found: {incar_path}")
+
         with open(incar_path) as fp:
             tmp_incar = fp.read()
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with open(file_name[:-6] + "INCAR") as fp:
tmp_incar = fp.read()
res_incar = re.sub(
import os
incar_path = file_name[:-6] + "INCAR"
if not os.path.exists(incar_path):
raise FileNotFoundError(f"File not found: {incar_path}")
with open(incar_path) as fp:
tmp_incar = fp.read()
res_incar = re.sub(
Tools
GitHub Check: codecov/patch

[warning] 44-46: dpdata/plugins/vasp_deltaspin.py#L44-L46
Added lines #L44 - L46 were not covered by tests

Comment on lines +18 to +24
with open(file_name) as fp:
lines = [line.rstrip("\n") for line in fp]
with open(file_name[:-6] + "INCAR") as fp:
lines_incar = [line.rstrip("\n") for line in fp]
data = dpdata.vasp_deltaspin.poscar.to_system_data(lines, lines_incar)
data = uniq_atom_names(data)
return data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure robust error handling when opening files

Currently, the code opens file_name and file_name[:-6] + "INCAR" without handling potential exceptions if the files do not exist or cannot be accessed. This may lead to unhandled exceptions and crash the program.

Consider adding error handling to manage file-related exceptions and provide meaningful error messages.

+        import os
+        if not os.path.exists(file_name):
+            raise FileNotFoundError(f"File not found: {file_name}")
+        if not os.path.exists(file_name[:-6] + "INCAR"):
+            raise FileNotFoundError(f"File not found: {file_name[:-6] + 'INCAR'}")
+
         with open(file_name) as fp:
             lines = [line.rstrip("\n") for line in fp]
         with open(file_name[:-6] + "INCAR") as fp:
             lines_incar = [line.rstrip("\n") for line in fp]

Committable suggestion was skipped due to low confidence.

Tools
GitHub Check: codecov/patch

[warning] 18-24: dpdata/plugins/vasp_deltaspin.py#L18-L24
Added lines #L18 - L24 were not covered by tests

Comment on lines +135 to +142
def get_spintype(keys):
key_sp = ["sp", "spx", "spy", "spz"]
key_csp = ["c_spin[1]", "c_spin[2]", "c_spin[3]", "c_spin[4]"]
lmp_sp_type = [key_sp, key_csp]
for k in range(2):
if all(i in keys for i in lmp_sp_type[k]):
return lmp_sp_type[k]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit tests for the get_spintype function to ensure correctness

The newly added get_spintype function is not covered by unit tests. Implementing tests for this function will help verify that it accurately identifies different spin types based on the provided keys, enhancing the reliability of spin data processing.

Would you like assistance in creating unit tests for this function?

Tools
GitHub Check: codecov/patch

[warning] 136-141: dpdata/lammps/dump.py#L136-L141
Added lines #L136 - L141 were not covered by tests

Comment on lines +144 to +169
def safe_get_spin_force(lines):
blk, head = _get_block(lines, "ATOMS")
keys = head.split()
sp_type = get_spintype(keys)
assert sp_type is not None, "Dump file does not contain spin!"
id_idx = keys.index("id") - 2
sp = keys.index(sp_type[0]) - 2
spx = keys.index(sp_type[1]) - 2
spy = keys.index(sp_type[2]) - 2
spz = keys.index(sp_type[3]) - 2
sp_force = []
for ii in blk:
words = ii.split()
sp_force.append(
[
float(words[id_idx]),
float(words[sp]),
float(words[spx]),
float(words[spy]),
float(words[spz]),
]
)
sp_force.sort()
sp_force = np.array(sp_force)[:, 1:]
return sp_force

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit tests for safe_get_spin_force to validate spin force extraction

The safe_get_spin_force function is crucial for extracting spin force data from the dump files, but it currently lacks test coverage. Adding unit tests will ensure it handles various spin data formats correctly and robustly detects cases where spin data is missing or malformed.

I can help create unit tests for this function if you'd like.

Tools
GitHub Check: codecov/patch

[warning] 145-157: dpdata/lammps/dump.py#L145-L157
Added lines #L145 - L157 were not covered by tests


[warning] 166-168: dpdata/lammps/dump.py#L166-L168
Added lines #L166 - L168 were not covered by tests

* safe_get_spin_force(array_lines[ii])[:, 1:4]
)
if contain_spin:
system["spins"] = np.array(system["spins"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests to verify spin data array conversion

Converting system["spins"] to a NumPy array is essential for ensuring data consistency, but this line is not covered by tests. Consider adding tests to validate that spin data is correctly aggregated and converted after processing multiple blocks.

Would you like assistance in creating tests for this functionality?

Tools
GitHub Check: codecov/patch

[warning] 279-279: dpdata/lammps/dump.py#L279
Added line #L279 was not covered by tests

Comment on lines +255 to +260
contain_spin = False
blk, head = _get_block(lines, "ATOMS")
if "sp" in head:
contain_spin = True
spin_force = safe_get_spin_force(lines)
system["spins"] = [spin_force[:, :1] * spin_force[:, 1:4]]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use get_spintype for consistent spin data detection

Currently, the code checks if "sp" in head to determine the presence of spin data. Since spin data may be indicated by different keys (e.g., "sp", "spx", "c_spin[1]"), using the get_spintype function ensures consistent detection across various formats.

Apply this diff to improve spin data detection:

         contain_spin = False
         blk, head = _get_block(lines, "ATOMS")
+        keys = head.split()
+        sp_type = get_spintype(keys)
+        if sp_type is not None:
             contain_spin = True
             spin_force = safe_get_spin_force(lines)
             system["spins"] = [spin_force[:, :1] * spin_force[:, 1:4]]
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
contain_spin = False
blk, head = _get_block(lines, "ATOMS")
if "sp" in head:
contain_spin = True
spin_force = safe_get_spin_force(lines)
system["spins"] = [spin_force[:, :1] * spin_force[:, 1:4]]
contain_spin = False
blk, head = _get_block(lines, "ATOMS")
keys = head.split()
sp_type = get_spintype(keys)
if sp_type is not None:
contain_spin = True
spin_force = safe_get_spin_force(lines)
system["spins"] = [spin_force[:, :1] * spin_force[:, 1:4]]
Tools
GitHub Check: codecov/patch

[warning] 258-260: dpdata/lammps/dump.py#L258-L260
Added lines #L258 - L260 were not covered by tests

Comment on lines +273 to +277
if contain_spin:
system["spins"].append(
safe_get_spin_force(array_lines[ii])[:, :1]
* safe_get_spin_force(array_lines[ii])[:, 1:4]
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize by storing the result of safe_get_spin_force

In the loop, safe_get_spin_force(array_lines[ii]) is called twice, leading to unnecessary computation. Storing the result in a variable enhances performance by avoiding redundant function calls.

Apply this diff to optimize the code:

             if contain_spin:
+                spin_force = safe_get_spin_force(array_lines[ii])
                 system["spins"].append(
-                    safe_get_spin_force(array_lines[ii])[:, :1]
-                    * safe_get_spin_force(array_lines[ii])[:, 1:4]
+                    spin_force[:, :1] * spin_force[:, 1:4]
                 )
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if contain_spin:
system["spins"].append(
safe_get_spin_force(array_lines[ii])[:, :1]
* safe_get_spin_force(array_lines[ii])[:, 1:4]
)
if contain_spin:
spin_force = safe_get_spin_force(array_lines[ii])
system["spins"].append(
spin_force[:, :1] * spin_force[:, 1:4]
)
Tools
GitHub Check: codecov/patch

[warning] 274-274: dpdata/lammps/dump.py#L274
Added line #L274 was not covered by tests

Copy link
Contributor

@wanghan-iapcm wanghan-iapcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pxlxingliang pxlxingliang mentioned this pull request Oct 11, 2024
njzjz pushed a commit that referenced this pull request Oct 23, 2024
Reference #728 to add the spin information for lammps.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced functions to extract and normalize spin data from LAMMPS
dump files.
	- Added support for registering spin data types in the system.
	- Created new simulation input and output files for LAMMPS.
- Added new compute and dump commands for calculating and outputting
spin properties in LAMMPS.

- **Tests**
- Implemented unit tests for verifying the handling of spin
configurations in LAMMPS files, including scenarios with zero spin
values and incomplete data.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: root <pxlxingliang>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Han Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants