v5.5.0
CUDA v5.5.0
Merged pull requests:
- Add support for arbitrary group sizes in
gemm_grouped_batched!
(#2334) (@lpawela) - Add kernel compilation requirements to docs (#2416) (@termi-official)
- Enzyme: reverse mode kernels (#2422) (@wsmoses)
- CUFFT: Support Float16 (#2430) (@eschnett)
- Updated compute-sanitizer documentation (#2440) (@alexp616)
- Add troubleshooting section for NSight Compute (#2442) (@efaulhaber)
- Correct typo in documentation (#2445) (@eschnett)
- Bump minimal Julia requirement to v1.10. (#2447) (@maleadt)
- fix compute-sanitizer typo (#2448) (@alexp616)
- Address a corner case when establishing p2p access (#2457) (@findmyway)
- Implementation of spdiagm for CUSPARSE (#2458) (@walexaindre)
- Update to CUDA 12.6. (#2461) (@maleadt)
- CompatHelper: bump compat for GPUCompiler to 0.27, (keep existing compat) (#2462) (@github-actions[bot])
- Bump CUDA driver JLL. (#2463) (@maleadt)
- CUSOLVER (dense): cache workspace in fat handle (#2465) (@bjarthur)
- Revert "Run full GC when under very high memory pressure." (#2469) (@maleadt)
- Fix a method deprecation. (#2470) (@maleadt)
- Add Enzyme sum derivatives (#2471) (@wsmoses)
- Re-use pre-converted kernel arguments when launching kernels. (#2472) (@maleadt)
- Bump LLVM compat (#2473) (@maleadt)
- Bump subpackage compat. (#2475) (@maleadt)
- Enzyme: Reversemode cudaconvert (#2476) (@wsmoses)
- Ignore Enzyme.jl CI failures (#2479) (@maleadt)
- Re-enable enzyme testing (#2480) (@wsmoses)
- Add missing GC.@preserves. (#2487) (@maleadt)
- [CUSPARSE] Implement a sparse GEMV for CuSparseMatrixCSC * CuSparseVector (#2488) (@amontoison)
- [CUSPARSE] Add conversions between CuSparseVector and CuSparseMatrices (#2489) (@amontoison)
- Update to LLVM 9.1. (#2491) (@maleadt)
- Use at-consistent_overlay for 1.11 compatibility. (#2492) (@maleadt)
- Rework NNlib CI. (#2493) (@maleadt)
- CUSPARSE: Fix sparse constructor with duplicate elements. (#2495) (@maleadt)
Closed issues:
LinearAlgebra.norm(x)
falls back to generic implementation forx::Transpose
andx::Adjoint
(#1782)- dlclose'ing the compatibility driver can fail (#1848)
- Creating a sparse diagonal matrix of CuArray(u) (#1857)
- Support for Julia 1.11 (#2241)
- CUDA 12.4 Update 1: CUPTI does not trace kernels anymore (#2328)
- Adding CUDA to a PackageCompiler sysimage causes segfault (#2428)
- Error using CUDA on Julia 1.10:
Number of threads per block exceeds kernel limit
(#2438) - Error when I load my model (#2439)
- Driver JLL improvements (#2446)
- Deadlock when callling CUDA.jl in an adopted thread while blocking the main thread (#2449)
- CUDA.Mem.unregister fails with CUDA.jl 5.4 (not with 5.3) (#2452)
- Segmentation Fault on Loading CUDA (#2453)
Invalid instruction
error whenusing CUDA
(#2454)- Missing
adapt
for sparse andCUDABackend
(#2459) - CUDA precompile cannot find/load "cupti64_2024.2.1.dll" during precompilation (juliaup 1.10.4, Windows 11) (#2466)
- Request: Option to disable the "full GC when under very high memory pressure". (#2467)
- copyto! ambiguous (#2477)
- NeuralODE training failed on GPU with Enzyme (#2478)
- issue with atomic - when running standard test, @atomic modify expression missing field access (#2483)
- Support for creating a CuSparseMatrixCSC from a CuSparseVector (#2484)
- Issue with compiling CUDA and cuTENSOR using local libraries (#2486)
- Memory Access error in sparse array constructor (#2494)
- Forwards-compatible driver breaks CURAND (#2496)
- CUDA 12.6 Update 1 (#2497)