Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can‘t compile with USE_CUDA=1 and ENABLE_DEEPKS=1 simultaneously #5910

Open
16 tasks
xuan112358 opened this issue Feb 19, 2025 · 1 comment
Open
16 tasks
Labels
DeePKS Issues related to the DeePKS GPU & DCU & HPC GPU and DCU and HPC related any issues

Comments

@xuan112358
Copy link
Collaborator

Describe the bug

I can compile ABACUS with USE_CUDA=1 or ENABLE_DEEPKS=1.
But I can't compile with USE_CUDA=1 and ENABLE_DEEPKS=1 simultaneously.
The cmake error is:

CMake Error at cmake/FindMKL.cmake:87 (add_library):
add_library cannot create ALIAS target "IntelMKL::MKL" because another
target with the same name already exists.
Call Stack (most recent call first):
/home/xuan/03_library/libtorch-2.3.1/share/cmake/Caffe2/public/mkl.cmake:1 (find_package)
/home/xuan/03_library/libtorch-2.3.1/share/cmake/Caffe2/Caffe2Config.cmake:113 (include)
/home/xuan/03_library/libtorch-2.3.1/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
CMakeLists.txt:499 (find_package)

-- Found Torch: /home/xuan/03_library/libtorch-2.3.1/lib/libtorch.so
-- Checking for one of the modules 'libxc'
-- Found Libxc: /home/xuan/03_library/libxc/libxc-5.2.3/lib/libxc.a
-- Found Libxc: version 5.2.3
-- Configuring incomplete, errors occurred!

I used oneapi of 2022. If I use the 2024 version, it seems that there is a mismatch between compiler and cudatoolkit, with the error of "Could not find librt library, needed by CUDA::cudart_static"
@dyzheng @caic99 @dzzz2001 Can you help me?

Expected behavior

No response

To Reproduce

No response

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Verify the issue is not a duplicate.
  • Describe the bug.
  • Steps to reproduce.
  • Expected behavior.
  • Error message.
  • Environment details.
  • Additional context.
  • Assign a priority level (low, medium, high, urgent).
  • Assign the issue to a team member.
  • Label the issue with relevant tags.
  • Identify possible related issues.
  • Create a unit test or automated test to reproduce the bug (if applicable).
  • Fix the bug.
  • Test the fix.
  • Update documentation (if necessary).
  • Close the issue and inform the reporter (if applicable).
@mohanchen mohanchen added DeePKS Issues related to the DeePKS GPU & DCU & HPC GPU and DCU and HPC related any issues labels Feb 20, 2025
@AsTonyshment
Copy link
Collaborator

Actually, compiling with GCC works fine. It seems that Intel oneAPI does not naturally support CUDA very well:

~/abacus-develop (develop) $ gcc -v       
......
gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04) 

~/abacus-develop (develop) $ cmake -B build -DELPA_INCLUDE_DIR=~/Softwares/elpa-2024.05.001/elpa -DELPA_LIBRARIES=~/Softwares/elpa-2024.05.001/lib/libelpa_openmp.so -DCMAKE_PREFIX_PATH=~/Softwares/elpa-2024.05.001/lib -DENABLE_LIBXC=1 -DUSE_CUDA=1 -DENABLE_DEEPKS=1 -DTorch_DIR=~/Softwares/libtorch/share/cmake/Torch/ -Dlibnpy_INCLUDE_DIR=~/Softwares/libnpy/include
-- The CXX compiler identification is GNU 12.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.34.1") 
-- Found git: attempting to get commit info...
-- Current commit hash: e7b5c1257
-- Last commit date: Wed Feb 19 17:34:48 2025 +0800
-- Found Cereal: /usr/include  
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2") 
-- Found ELPA: ~/Softwares/elpa-2024.05.001/lib/libelpa_openmp.so  
-- Performing Test ELPA_VERSION_SATISFIES
-- Performing Test ELPA_VERSION_SATISFIES - Success
-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda-12.1/bin/nvcc
-- Found CUDAToolkit: /usr/local/cuda-12.1/include (found version "12.1.66") 
-- The CUDA compiler identification is NVIDIA 12.1.66
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-12.1/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found FFTW3: /usr/lib/x86_64-linux-gnu/libfftw3_omp.so  
-- Looking for sgemm_
-- Looking for sgemm_ - not found
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /usr/lib/x86_64-linux-gnu/libopenblas.so  
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /usr/lib/x86_64-linux-gnu/libopenblas.so;-lm;-ldl  
-- Found ScaLAPACK: /usr/lib/x86_64-linux-gnu/libscalapack-openmpi.so  
-- Could NOT find MKL (missing: MKL_DIR)
-- Found MKL_SCALAPACK: MKL_SCALAPACK-NOTFOUND
-- Found Torch: ~/Softwares/libtorch/lib/libtorch.so  
-- Checking for one of the modules 'libxc'
-- Found Libxc: ~/Softwares/libxc-7.0.0-install/lib/libxc.a  
-- Found Libxc: version 7.0.0
-- Configuring done
-- Generating done
-- Build files have been written to: ~/abacus-develop/build

~/abacus-develop (develop) $ cmake --build build -j32
......
[100%] Building CXX object CMakeFiles/abacus.dir/source/main.cpp.o
[100%] Linking CXX executable abacus
[100%] Built target abacus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DeePKS Issues related to the DeePKS GPU & DCU & HPC GPU and DCU and HPC related any issues
Projects
None yet
Development

No branches or pull requests

3 participants