Release v5.3.4 · JuliaGPU/CUDA.jl

CUDA v5.3.4

Diff since v5.3.3

Merged pull requests:

Add Enzyme Forward mode custom rule (#1869) (@wsmoses)
Handle cache improvements (#2352) (@maleadt)
Fix cuTensorNet compat (#2354) (@maleadt)
Optimize array allocation. (#2355) (@maleadt)
Change type restrictions in cuTENSOR operations (#2356) (@lkdvos)
Bump julia-actions/setup-julia from 1 to 2 (#2357) (@dependabot[bot])
Suggest use of 32 bit types over 64 instead of just Float32 over Float64 [skip ci] (#2358) (@Zentrik)
Make generic_trimatmul more specific (#2359) (@tgymnich)
Return the currect memory type when wrapping system memory. (#2363) (@maleadt)
Mark cublas version/handle as non-differentiable (#2368) (@wsmoses)
Enzyme: Forward mode sync (#2369) (@wsmoses)
Enzyme: support fill (#2371) (@wsmoses)
unsafe_wrap: unconditionally use the memory type provided by the user. (#2372) (@maleadt)
Remove external_gvars. (#2373) (@maleadt)
Tegra support with artifacts (#2374) (@maleadt)
Backport Enzyme extension (#2375) (@wsmoses)
Add note about --check-bounds=yes (#2378) (@Zinoex)
Test Enzyme in a separate CI job. (#2379) (@maleadt)
Fix tests for Tegra. (#2381) (@maleadt)
Update Project.toml [remove EnzymeCore unconditional dep] (#2382) (@wsmoses)

Closed issues:

Native Softmax (#175)
CUSOLVER: support eigendecomposition (#173)
backslash with gpu matrices crashes julia (#161)
at-benchmark captures GPU arrays (#156)
Support kernels returning Union{} (#62)
mul! falls back to generic implementation (#148)
\ on qr factorization objects gives a method error (#138)
Compiler failure if dependent module only contains a japi1 function (#49)
copy!(dst, src) and copyto!(dst, src) are significantly slower and allocate more memory than copyto!(dest, do, src, so[, N]) (#126)
Calling Flux.gpu on a view dumps core (#125)
Creating CuArray{Tracker.TrackedReal{Float64},1} a few times causes segfaults (#121)
Guard against exceeding maximum kernel parameter size (#32)
Detect common API misuse in error handlers (#31)
rand and friends default to Float64 (#108)
\ does not work for least squares (#104)
ERROR_ILLEGAL_ADDRESS when broadcasting modular arithmetic (#94)
CuIterator assumes batches to consist of multiple arrays (#86)
Algebra with UniformScaling Uses Generic Fallback Scalar Indexing (#85)
Document (un)supported language features for kernel programming (#13)
Missing dispatch for indexing of reshaped arrays (#556)
Track array ownership to avoid illegal memory accesses (#763)
NVPTX i128 support broken on LLVM 11 / Julia 1.6 (#793)
Support for sm_80 cp.async: asynchronous on-device copies (#850)
Profiling Julia with Nsight Systems on Windows results in blank window (#862)
sort! and partialsort! are considerably slower than CPU versions (#937)
mul! does not dispatch on Adjoint (#1363)
Cross-device copy of wrapped arrays fails (#1377)
Memory allocation becomes very slow when reserved bytes is large (#1540)
Cannot reclaim GPU Memory; CUDA.reclaim() (#1562)
Add eigen for general purpose computation of eigenvectors/eigenvalues (#1572)
device_reset! does not seem to work anymore (#1579)
device-side rand() are not random between successive kernel launches (#1633)
Add EnzymeRules support for CUDA.jl (for forward mode here) (#1811)
cusparseSetStream_v2 not defined (#1820)
Feature request: Integrating the latest CUDA library "cuLitho" into CUDA.jl (#1821)
KernelAbstractions.jl-related issues (#1838)
lock failing in multithreaded plan_fft() (#1921)
CUSolver finalizer tries to take ReentrantLock (#1923)
Testsuite could be more careful about parallel testing (#2192)
Opportunistic GC collection (#2303)
Unable to use local CUDA runtime toolkit (#2367)
Enzyme prevents testing on 1.11 (#2376)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v5.3.4

CUDA v5.3.4

Contributors