Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crashed caused by calling dlopen in find_library #55801

Open
wgmitchener opened this issue Sep 18, 2024 · 3 comments
Open

crashed caused by calling dlopen in find_library #55801

wgmitchener opened this issue Sep 18, 2024 · 3 comments

Comments

@wgmitchener
Copy link

wgmitchener commented Sep 18, 2024

The find_library function in libdl.jl makes two calls to dlopen while searching for a library. It has been discovered that this causes problems when looking for the ROCm library libamdhip64.so on Fedora 40 Linux. ROCm links with LLVM, and calling dlopen on it twice causes LLVM to crash and report some inconsistency in its settings:

: CommandLine Error: Option 'disassemble' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

See this thread for where the problem shows up.

The problem is not specific to Julia. It can be reproduced in C by calling dlopen on the library, then dlcose, then dlopen again. See this post.

My guess is that as called in find_library, dlopen follwed by dlcose leaves some state from the library in the process's working memory, and this state causes confusion when dlopen is called on the library again.

Can find_library be re-implemented so as not to actually call dlopen and dlclose on the file once it's been found?

In general, programs will call find_library followed by dlopen, so if it's possible for find_library to change the state of the process so that a call to dlopen afterward might see leftover state and crash, it makes sense to me that find_library needs to be rewritten to not call dlopen at all.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 18, 2024

We probably should consider deprecating that. There is never a reason a user should be calling it before dlopen, as the user should just call dlopen instead. This function is from long before we had LazyLibrary and precompile such and we were experimenting with ways of making ccall work more reliably.

@giordano
Copy link
Contributor

We use find_library in MPI.jl to find the libmpi library, and save it as a preference (to be able to invalidate the cache in case we need to use a different libmpi): https://github.com/JuliaParallel/MPI.jl/blob/aac9688e6961bc7e3aeeba7600f5e7d0b10596a3/lib/MPIPreferences/src/MPIPreferences.jl#L194 No need to dlopen the library after find_library (yes, there's a call to identify_abi which internally calls dlopen, but that's unrelated and besides the point)

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 18, 2024

That is true, that usage is probably fine, but you might be better suited there to calling dlopen+dlpath directly instead? But notably there you wouldn't do that during precompile (since you'll corrupt the cache file due to the unsafe modification of preferences while loading) and therefore also wouldn't typically use dlopen directly afterwards either (as it isn't compatible with the already loaded MPI)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants