Release v5.6.0 · JuliaGPU/CUDA.jl

CUDA v5.6.0

CUDA.jl v5.6 is a relatively minor release, which the most important change being behind the scenes: GPUArrays.jl v11 has switched to KernelAbstractions.jl (#2524).

Features

Update to CUDA 12.6.2 (#2512)
CUSOLVER: support for Xgeev! (#2513), XsyevBatched (#2577), gesv! and gels! (#2406)
CUBLAS: added multiplication of transpose / adjoint matrices by diagonal matrices (#2518, #2538)
Improve handle cache performance in the presence of many short-lived tasks (#2583)
CUFFT: Pre-allocate the buffer required for complex-to-real FFTs only once (#2578)
Improved batched pointer conversion for very large batches (#2608)

Bug fixes

Fix findall with an empty CuArray (#2554)
CUBLAS: Fix use of level 1 methods with strided arrays (#2528)
CUSOLVER: Fix Xgesvdr! (#2556)
Preserve the array buffer type with more linear algebra operations (#2534)
Work around LinearAlgebra.jl breakage in Julia 1.11.2 concerning generic triangular (l/r)mul! - (#2585)
Fix ambiguity of LinearAlgebra.dot (#2569)
Native RNG: Fixes when working with very large arrays (#2561)
Avoid a deadlock due do union splitting in the mapreduce kernel (#2595)
Fix pinning of resized CPU memory by automatically re-pinning (#2599)

Merged pull requests:

[CUSOLVER] Interface gesv! and gels! (#2406) (@amontoison)
Update wrappers for CUDA v12.6.2 (#2512) (@amontoison)
[CUSOLVER] Interface Xgeev! (#2513) (@amontoison)
Added multiplication of transpose / adjoint matrices by diagonal matrices (#2518) (@amontoison)
CompatHelper: bump compat for GPUCompiler to 1, (keep existing compat) (#2521) (@github-actions[bot])
Adapt to GPUArrays.jl transition to KernelAbstractions.jl. (#2524) (@maleadt)
Switch CI to 1.11. (#2525) (@maleadt)
CUTENSOR: Reduce amount of broadcasts compiled during tests. (#2527) (@maleadt)
CUBLAS: Don't use BLAS1 wrappers for strided arrays, only vectors. (#2528) (@maleadt)
Clarify the synchronize(ctx)/device_synchronize() docstrings (#2532) (@JamesWrigley)
Issue #2533: Preserving the buffer type in linear algebra (#2534) (@kmp5VT)
Clarify description of how LocalPreferences.toml is generated in the docs (#2535) (@glwagner)
Adapt to JuliaGPU/GPUArrays.jl#567. (#2537) (@maleadt)
Removed allocations for transpose/adjoint - diagonal multiplications (#2538) (@RedRussianBear)
Consistent use of Nsight Compute (#2541) (@huiyuxie)
Fix formatting in profiling docs page (#2543) (@efaulhaber)
Fix typo in EnzymeCoreExt.jl (#2550) (@wsmoses)
Enhance warning under a profiler (#2552) (@huiyuxie)
Fix findall with an empty CuArray of Bool (#2554) (@amontoison)
[CUSOLVER] Fix Xgesvdr! (#2556) (@amontoison)
Test restore Enzyme.jl (#2557) (@wsmoses)
Native RNG fixes for very large arrays (#2561) (@maleadt)
[Enzyme] Mark launch_configuration as inactive (#2563) (@wsmoses)
Update EnzymeCoreExt.jl (#2565) (@simenhu)
Fix ambiguity of LinearAlgebra.dot (#2569) (@amontoison)
[CUSOLVER] Add more tests for the dense SVD (#2574) (@amontoison)
[CUSOLVER] Interface XsyevBatched (#2577) (@amontoison)
[CUFFT] Preallocate a buffer for complex-to-real FFT (#2578) (@amontoison)
Run the GC when failing to find a handle, but lots are active. (#2583) (@maleadt)
Work around LinearAlgebra.jl breakage in 1.11.2. (#2585) (@maleadt)
mapreduce: avoid deadlock by forcing the accumulator type. (#2596) (@maleadt)
Switch to GitHub Actions-based benchmarks. (#2597) (@maleadt)
Re-pin variable sized memory (#2599) (@jipolanco)
Enzyme: add make_zero of cuarrays (#2600) (@wsmoses)
Update cache.jl (#2604) (@jarbus)
Enzyme: mark device_sync as non-differentiable [only downstream] (#2605) (@wsmoses)
Move strided batch pointer conversion to GPU (#2608) (@THargreaves)
Split linalg tests into multiple files (#2609) (@kshyatt)

Closed issues:

Inference failure with sort(::CuMatrix) after loading MLDatasets (#2258)
Kron Support for CuSparseMatrixCSC (#2370)
Broadcasting a function returning an anonymous function with a constructor over CUDA arrays fails to compile, "not isbits" (#2514)
CuArray view has different variable type outside x inside the cuda kernel (#2516)
Can't build cuDNN on centos7.8 (#2517)
Precompile errors (#2519)
Precompile errors (#2520)
Error returned from CUDA function in CUDA-aware MPI multi-GPU test (#2522)
Broadcasting over random static array errors on Julia 1.11 (#2523)
gemm_strided_batched only using strided CUDA kernel when first matrix is transposed (#2529)
CUDA runtime libraries are loaded from a system path due to LD_LIBRARY_PATH being set (#2530)
[Bug] UnifiedMemory buffer changes during LinearAlgebra operations (#2533)
Improve system library warning when running under profiler (#2540)
Local CUDA settings not propagated to Pkg.test (#2545)
Out of Memory when working with Distributed for Small Matricies (#2548)
findall is not working with an empty vector of bool (#2553)
CUDA code does not return when running under VSC Debugging mode (#2558)
dot is quite slow in multinest Arrays (#2559)
UndefVarError: backend not defined in GPUArrays (#2564)
view() returns CuArray instead of view for 1-D CuArrays (#2566)
dot ambiguity (#2568)
InvalidIRError thrown only if critical function is not previously compiled (#2573)
circular dependency during precompilation (#2579)
Sparse MatVec Is Nondeterministic? (#2582)
CUDA triggers long Circular dependency list (#2586)
Release v5.5.3 for GPUArray v11? (#2587)
'dot' gives different answers when viewing rather than slicing multidimensional arrays (#2589)
Scalar indexing when performing kron on two CuVectors (#2591)
Faster strided-batched to batched wrapper (#2592)
Error when copying data to pinned and resized CPU array (#2594)
mapreducedim! size-dependent fail when narrowing float element types (#2595)
Missing Enzyme.make_zero in Enzyme extension leads to incorrect behaviour (#2598)
'ArgumentError: array must be non-empty' when attempting to pop idle handles from HandleCache (#2603)
Do a release as current one doesn't support GPUArrays v11 (#2606)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v5.6.0

CUDA v5.6.0

Features

Bug fixes

Contributors