Release v5.5.0 · JuliaGPU/CUDA.jl

CUDA v5.5.0

Blog post

Diff since v5.4.3

Merged pull requests:

Add support for arbitrary group sizes in gemm_grouped_batched! (#2334) (@lpawela)
Add kernel compilation requirements to docs (#2416) (@termi-official)
Enzyme: reverse mode kernels (#2422) (@wsmoses)
CUFFT: Support Float16 (#2430) (@eschnett)
Updated compute-sanitizer documentation (#2440) (@alexp616)
Add troubleshooting section for NSight Compute (#2442) (@efaulhaber)
Correct typo in documentation (#2445) (@eschnett)
Bump minimal Julia requirement to v1.10. (#2447) (@maleadt)
fix compute-sanitizer typo (#2448) (@alexp616)
Address a corner case when establishing p2p access (#2457) (@findmyway)
Implementation of spdiagm for CUSPARSE (#2458) (@walexaindre)
Update to CUDA 12.6. (#2461) (@maleadt)
CompatHelper: bump compat for GPUCompiler to 0.27, (keep existing compat) (#2462) (@github-actions[bot])
Bump CUDA driver JLL. (#2463) (@maleadt)
CUSOLVER (dense): cache workspace in fat handle (#2465) (@bjarthur)
Revert "Run full GC when under very high memory pressure." (#2469) (@maleadt)
Fix a method deprecation. (#2470) (@maleadt)
Add Enzyme sum derivatives (#2471) (@wsmoses)
Re-use pre-converted kernel arguments when launching kernels. (#2472) (@maleadt)
Bump LLVM compat (#2473) (@maleadt)
Bump subpackage compat. (#2475) (@maleadt)
Enzyme: Reversemode cudaconvert (#2476) (@wsmoses)
Ignore Enzyme.jl CI failures (#2479) (@maleadt)
Re-enable enzyme testing (#2480) (@wsmoses)
Add missing GC.@preserves. (#2487) (@maleadt)
[CUSPARSE] Implement a sparse GEMV for CuSparseMatrixCSC * CuSparseVector (#2488) (@amontoison)
[CUSPARSE] Add conversions between CuSparseVector and CuSparseMatrices (#2489) (@amontoison)
Update to LLVM 9.1. (#2491) (@maleadt)
Use at-consistent_overlay for 1.11 compatibility. (#2492) (@maleadt)
Rework NNlib CI. (#2493) (@maleadt)
CUSPARSE: Fix sparse constructor with duplicate elements. (#2495) (@maleadt)

Closed issues:

LinearAlgebra.norm(x) falls back to generic implementation for x::Transpose and x::Adjoint (#1782)
dlclose'ing the compatibility driver can fail (#1848)
Creating a sparse diagonal matrix of CuArray(u) (#1857)
Support for Julia 1.11 (#2241)
CUDA 12.4 Update 1: CUPTI does not trace kernels anymore (#2328)
Adding CUDA to a PackageCompiler sysimage causes segfault (#2428)
Error using CUDA on Julia 1.10: Number of threads per block exceeds kernel limit (#2438)
Error when I load my model (#2439)
Driver JLL improvements (#2446)
Deadlock when callling CUDA.jl in an adopted thread while blocking the main thread (#2449)
CUDA.Mem.unregister fails with CUDA.jl 5.4 (not with 5.3) (#2452)
Segmentation Fault on Loading CUDA (#2453)
Invalid instruction error when using CUDA (#2454)
Missing adapt for sparse and CUDABackend (#2459)
CUDA precompile cannot find/load "cupti64_2024.2.1.dll" during precompilation (juliaup 1.10.4, Windows 11) (#2466)
Request: Option to disable the "full GC when under very high memory pressure". (#2467)
copyto! ambiguous (#2477)
NeuralODE training failed on GPU with Enzyme (#2478)
issue with atomic - when running standard test, @atomic modify expression missing field access (#2483)
Support for creating a CuSparseMatrixCSC from a CuSparseVector (#2484)
Issue with compiling CUDA and cuTENSOR using local libraries (#2486)
Memory Access error in sparse array constructor (#2494)
Forwards-compatible driver breaks CURAND (#2496)
CUDA 12.6 Update 1 (#2497)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v5.5.0

CUDA v5.5.0

Contributors