-
Notifications
You must be signed in to change notification settings - Fork 13
- Look at the file numbers to find at what point it fails.
- With some initial conditions, the integration can diverge or oscillate and amplify. Check the content of files, see if you have NaNs.
- You can use the debug helper tools to check for CUDA errors, check the content of arrays for NaNs, or compare to a previous run of the simulation to see in what steps of the integration loop values became different.
-
When the model crashes, this usually involves solving for the pressure in Equations 35 and 36 in Mendonca et al. 2016. This is done in
Density_Pressure_Eqs
insrc/headers/dyn/thor_fastmodes.h
. -
The instability occurs when a negative value is returned for
("entropy density", essentially) in Equation 35--solving for the pressure in Equation 36 will then result in a NaN.
-
While there are a number of physical and numerical causes of the instability, the most common problem seems to be vertical waves reflecting off the top boundary of the model.
- The absolute first thing to try is to
make clean
, then attempt to recompile. The Makefile only recompiles code components that have been altered. Nine times out of ten, this works fine. Sometimes, however, the code you changed will clash in some way with compiled (and unchanged) code. Themake clean
command removes all the pre-compiled code so that you can start fresh. - When you compile with the
-j8
flag, the compilation is parallelized. This makes it hard to tell which step or section of code is killing the make process. Try removing the flag so that compilation is done serially.
- Note: we are currently attempting to develop a knowledge base of CUDA and gcc compatibility. Please inform us or open issue when you encounter them, as there are numerous versions of CUDA toolkit and gcc/g++.
- The CUDA compiler
nvcc
invokesgcc
org++
, but different versions of this compiler have limited compatibility with gcc/g++. Here are limitations we have encountered: - CUDA 9.x requires gcc/g++ version < 7
- CUDA 8.x requires gcc/g++ verions < 6
- The make file is set up to raise an error in the case of the above conflicts. If you have a compatible version of g++ but the default on your system is a newer version, you can specify the g++ version when you build THOR, for example:
make -j8 release ccbin='-ccbin=g++-6'
- There are a few things to check if you are having conflicts: run
nvidia-smi
on the command line and verify that the driver versions are consistent next toNVIDIA-SMI
andDriver Version
; check the version of CUDA withnvcc --version
; check whether you have different versions of the CUDA-toolkit installed (usually in /usr/local/). - In our experience, having multiple versions of CUDA-toolkit or nvidia-driver installed leads to a Gordian knot of conflicts that are virtually impossible to resolve manually. Best practice is to purge your system of nvidia and CUDA related files, and reinstall a single version.
- This indicates your HDF5 libraries aren't being correctly added to the path.
- Scroll up through the error messages, and verify that the libraries printed out in "h5libs=" are actually in the directories indicated by "h5libdir=" and "h5include=". Then "make clean" and compile again.
- If the correct folders refused to be auto-detected, you can hardcode the correct locations into "Makefile." E.g., on a CentOS7 Linux box adding the following lines did the trick (your mileage may vary):
h5libdir := -L/usr/lib64
h5include := -I/usr/include
Check that it's a bug, and not the simulation getting unstable and in an unphysical state. If it's a bug in the engine, try to find a way to reproduce it and report it in the issue tracker on github. For some info on how to report, have a look at our bug reporting page.