Releases · JuliaGPU/CUDA.jl

19 Sep 08:39

github-actions

v5.0.0

2fa6572

v5.0.0

CUDA v5.0.0

Blog post: https://info.juliahub.com/cuda-jl-5-0-changes

This is a breaking release, but the breaking changes are minimal (see the blog post for details):

Julia 1.8 is now required, and only CUDA 11.4+ is supported
selection of local toolkits has changed slightly

Diff since v4.4.1

Merged pull requests:

Added support for more transform directions (#1903) (@RainerHeintzmann)
Add some performance tips to the documentation (#1999) (@Zentrik)
Re-introduce the 'blocking' kwargs to at-sync. (#2060) (@maleadt)
Adapt to GPUCompiler#master. (#2062) (@maleadt)
Batched SVD added (gesvdjBatched and gesvdaStridedBatched) (#2063) (@nikopj)
Use released GPUCompiler. (#2064) (@maleadt)
Fixes for Windows. (#2065) (@maleadt)
Switch to GPUArrays buffer management. (#2068) (@maleadt)
Update CUDA 12 to Update 2. (#2071) (@maleadt)
Update manifest (#2076) (@github-actions[bot])
Test improvements (#2079) (@maleadt)
Update manifest (#2082) (@github-actions[bot])

Closed issues:

StaticArrays.SHermitianCompact not working in kernels in Julia 1.10.0-beta2 (#2069)
Support for LinearAlgebra.pinv (#2070)

Contributors

maleadt, RainerHeintzmann, and 2 other contributors

Assets 2

25 Aug 20:24

github-actions

v4.4.1

9888ac9

v4.4.1

CUDA v4.4.1

Diff since v4.4.0

Closed issues:

CUDA driver device support does not match toolkit (#70)
Launching kernels should not allocate (#66)
sync_threads() appears to not be sync'ing threads (#61)
Exception when using CuArrays with Flux (#129)
Kernel using MVector fails to compile or crashes at runtime due to heap allocation (#45)
Performance regression on matrix multiplication between CUDA.jl 1.3.3 and 2.1.0/master (#538)
Improve 'VS C++ redistributable' error message (#764)
CUSPARSE does not support reductions (#1406)
CUDA test failed (#1690)
Type constructor in broadcast doesn't compile (#1761)
accumulate(+) gives different results for CuArray compared to Array. (#1810)
Compat driver: preload all libraries (#1859)
Stream synchronization is slow when waiting on the event from CUDA (#1910)
cuDNN: Store convolution algorithm choice to disk. (#1947)
Disable 'No CUDA-capable device found' error log (#1955)
CUDNN_STATUS_NOT_SUPPORTED using 1D CNN model (#1977)
Memory allocations during in-place sparse matrix-vector multiplication (#1982)
CUSPARSE.sum_dim1 sums the absolute values of elements (#1983)
Update to CUDA 12.2 (#1984)
unsafe_wrap fails on zero element CuArrays (#1985)
rand in kernel works in a deterministic way (#2008)
Scalar indexing with CuArray * ReshapedArray{SubArray{CuArray}}} (#2009)
volumerhs performance regression (#2010)
CuSparseMatrix constructors allocate too much memory? (#2015)
Native profiler using CUPTI (#2017)
libLLVM-15jl.so (#2018)
"symbol multiply defined" error (#2021)
Confusion on row major vs column major (#2023)
Printing of CuArrays gives zeros or random numbers (#2033)
sortperm! fails when output is UInt vector (#2046)
Re-introduce spinning loop before nonblocking synchronization (#2057)

Merged pull requests:

Check mathType only if not Float32 (#1943) (@RomeoV)
1.10 enablement (#1946) (@dkarrasch)
Implement reverse lookup (Ptr->Tuple) for CUDNN descriptors. (#1948) (@RomeoV)
Wrapper with tests for gemmBatchedEx! (#1975) (@lpawela)
Add wrappers for gemv_batched! (#1981) (@lpawela)
Update CUSPARSE.sum_dim<n> to allow for arbitrary function on elements (#1987) (@lpawela)
Update manifest (#1988) (@github-actions[bot])
Add vectorized cached loads (#1993) (@Zentrik)
Update manifest (#1995) (@github-actions[bot])
Fix typo in captured macro example (#1996) (@Zentrik)
Adapt Type call broadcasting to a function (#2000) (@simonbyrne)
[CUSPARSE] Added support for generalized dot product dot(x, A, y) = dot(x, A * y) without allocating A * y (#2001) (@albertomercurio)
Update manifest (#2002) (@github-actions[bot])
Support for printing types. (#2003) (@maleadt)
Fix accumulate bug (#2005) (@chrstphrbrns)
Update manifest (#2013) (@github-actions[bot])
Add a raw mode to code_sass. (#2019) (@maleadt)
Update manifest (#2022) (@github-actions[bot])
Add a native profiler. (#2024) (@maleadt)
Perform synchronization on a worker thread (#2025) (@maleadt)
Remove broken video link in docs (#2028) (@christiangnrd)
When freeing memory, use the high-level device getter. (#2029) (@maleadt)
Add support for @cuda fastmath (#2030) (@maleadt)
Make "CUDA.jl" a link on the doc entry page (#2031) (@carstenbauer)
Add support for CUDA 12.2. (#2034) (@maleadt)
rand: seed kernels from the host. (#2035) (@maleadt)
Update wrappers for CUDA 12.2. (#2039) (@maleadt)
On CUDA 12.2, have the memory pool enforce hard memory limits. (#2040) (@maleadt)
Delay all initialization errors until run time. (#2041) (@maleadt)
JLL/CI/Julia changes. (#2042) (@maleadt)
Add support for NVTX events to the integrated profiler. (#2043) (@maleadt)
Update cuStateVec to cuQuantum 23.6. (#2044) (@maleadt)
Add some more fastmath functions (#2047) (@Zentrik)
Fixup wrong key lookup. (#2048) (@RomeoV)
Update manifest (#2049) (@github-actions[bot])
Make sortperm! resilient to type mismatches. (#2051) (@maleadt)
Disable tests that cause GC corruption on 1.10. (#2053) (@maleadt)
enable dependabot for GitHub actions (#2054) (@ranocha)
Bump actions/checkout from 2 to 3 (#2055) (@dependabot[bot])
Bump peter-evans/create-pull-request from 3 to 5 (#2056) (@dependabot[bot])
Rework how local toolkits are selected. (#2058) (@maleadt)
Busy-wait before doing nonblocking synchronization. (#2059) (@maleadt)

Contributors

cuda, carstenbauer, and 11 other contributors

Assets 2

26 Jun 20:29

github-actions

v4.4.0

315c80e

v4.4.0

CUDA v4.4.0

Diff since v4.3.2

Closed issues:

Unreachable control flow leads to illegal divergent barriers (#1746)
CUBLAS fails on new CUDA.jl v4 (#1852)
Sort fails on Lovelace (sm8.9) GPUs (#1874)
gesvd! crashes on Pascal and v12.0 (#1932)
No effect for calling "nsys launch" (#1938)
Basic math operations with nested adjoint and transpose (#1940)
CPU and GPU implementations return results at dissimilar scales, even in double precision arithmetics (#1950)
Failed CUDA.jl initialization breaks Flux? (#1952)
Recent mul! changes break multiplication with matrices that have StaticArray elements (#1953)
Test infrastructure: define test groups (#1961)
Strange rand errors when sampling large matrices (#1963)
Add aqua tests (#1964)
Support of Orin GPU from Nvidia ? (#1966)
Crash in LLVM (#1971)
Warning cuDNN Convolution (#1972)
Strange behaviour when installed at system level (#1973)

Merged pull requests:

Update benchmarks for 1.8 and 1.9 (#1933) (@maleadt)
CUSOLVER: Explicitly pass NULL when not requesting svd outputs. (#1934) (@maleadt)
Detect and complain about loading system libraries. (#1935) (@maleadt)
Update manifest (#1936) (@github-actions[bot])
Avoid stack overflow with eary OOM reporting. (#1937) (@maleadt)
[CUSPARSE] Improved support for UniformScaling ad Diagonal (#1941) (@albertomercurio)
Update manifest (#1949) (@github-actions[bot])
Update GPUCompiler to fix unreachable control flow. (#1951) (@maleadt)
Allow StaticArray eltype in matmat{vec,mul} (#1954) (@lcw)
Bump CUDNN to v8.9. (#1959) (@maleadt)
Bump CUTENSOR to v1.7. (#1960) (@maleadt)
Add and fix some aqua tests (#1965) (@charleskawczynski)
Fix compatibility of CUDA 11.4 to support Orin. (#1967) (@maleadt)
Don't use Int32 indices in rand kernels. (#1969) (@maleadt)
CI simplifications (#1970) (@maleadt)
Use Base.pkgversion on 1.9. (#1974) (@maleadt)
Update to LLVM.jl 6. (#1976) (@maleadt)
fix launch config bug in bitonic sort (#1979) (@xaellison)
Update manifest (#1980) (@github-actions[bot])

Contributors

lcw, maleadt, and 3 other contributors

Assets 2

02 Jun 05:55

github-actions

v4.3.2

acd245e

v4.3.2

CUDA v4.3.2

Diff since v4.3.1

Merged pull requests:

Reduce load time by shifting mul! definition (#1904) (@dkarrasch)

Contributors

dkarrasch

Assets 2

31 May 19:40

github-actions

v4.3.1

b7420f8

v4.3.1

CUDA v4.3.1

Diff since v4.3.0

Closed issues:

Array testsuite compiles kernel with large types (#1902)
CUDA.jl v4 installs CUDA runtime despite version=local (#1922)
Occaisonal "CUSOLVERError: an internal operation failed (code 7, CUSOLVER_STATUS_INTERNAL_ERROR)" (#1924)
Does [email protected] need [email protected]? (#1929)

Merged pull requests:

Simplify libdevice linking. (#1927) (@maleadt)
Add a show method for kernel objects. (#1928) (@maleadt)
Update manifest (#1930) (@github-actions[bot])
Pass a higher capability to ptxas. (#1931) (@maleadt)

Contributors

maleadt

Assets 2

23 May 18:34

github-actions

v4.3.0

d3b1363

v4.3.0

CUDA v4.3.0

Diff since v4.2.0

Closed issues:

Multidimensional reverse (#1126)
Test errors on master (#1866)
Integer overflow error with svd for large matrix (#1880)
Erratic behaviour of CUDA.jl if used in the REPL of VSCode. (#1892)
QR decomposition requires scalar indexing (#1893)
BSOD during package tests (#1898)
Insufficient coverage of CuArrays in the documentation (#1901)
Failed to compile with Julia v1.9 on PowerPC (#1911)
CUDA test failed in wmma.jl (#1914)
Fix deprecation warnings (#1920)

Merged pull requests:

CUSOLVER: Fix workspace size passing. (#1890) (@maleadt)
Lovelace fixes (#1894) (@maleadt)
Update manifest (#1897) (@github-actions[bot])
Reverse with multiple dimensions (#1899) (@RainerHeintzmann)
Restrict number of test jobs based on available memory. (#1900) (@maleadt)
Avoid unneeded macros to cut down on generated code (#1905) (@maleadt)
Avoid unneeded macros to cut down on generated code (#1906) (@maleadt)
Update manifest (#1907) (@github-actions[bot])
Bump GPUCompiler. (#1908) (@maleadt)
Don't use Float64 atomics on unsupported platforms. (#1912) (@maleadt)
Report package versions as part of versioninfo(). (#1913) (@maleadt)
Align variables in constant memory by 256 bit (#1915) (@Zentrik)
Add norm functions for 3 floats (#1916) (@Zentrik)
cuDNN: only choose conv algorithms if they match descriptor mathType (#1917) (@ToucheSir)
Update manifest (#1918) (@github-actions[bot])
Skip Integer WMMA tests on older devices. (#1919) (@maleadt)

Contributors

maleadt, ToucheSir, and 2 other contributors

Assets 2

02 May 13:25

github-actions

v4.2.0

af65a44

v4.2.0

CUDA v4.2.0

Diff since v4.1.4

Closed issues:

NVTX: consider using Start/End for ranges (#1485)
Limitations of CuIterator (#1768)
Testing fails on unsupported devices. (#1815)
Local runtime discovery does not work for external libraries (CUDNN, CUTENSOR) (#1850)
Passing tests using Github CI workflow errors with libcuda not defined (#1867)
Cannot precompile GPU code with SnoopPrecompile (#1870)
Incorrect kernel execution with bounds checking using Julia 1.9.0-rc2 (#1875)
Fake CUDA library (#1879)
Error thrown when launching Julia with Nsight systems or compute. (#1886)
Cannot construct CuDeviceArray (#1887)
Incorrect colVal array when using CuSparseMatrixCSR command on sparse matrix (#1888)

Merged pull requests:

Use adapt symmetrically in CuIterator (#1769) (@mcabbott)
Allow but warn when testing on not fully-supported devices. (#1818) (@maleadt)
Support runtime discovery for non-toolkit libraries (CUTENSOR, CUDNN, CUQUANTUM) (#1858) (@mloubout)
Add KernelAbstractions.jl unsafe_free! (#1863) (@pxl-th)
Allow precompiling CUDA code. (#1865) (@maleadt)
Assert CUDA.jl is functional when creating the TLS. (#1868) (@maleadt)
Update manifest (#1871) (@github-actions[bot])
Don't collect AbstractQ objects in tests (#1872) (@dkarrasch)
Add compatibility entry for Lovelace (#1873) (@xaellison)
remove some type-piracy from cusparse (#1876) (@vtjnash)
Remove more unneeded ndims methods. (#1878) (@maleadt)
Guard the initialization-time CUDA driver check in a try/catch. (#1881) (@maleadt)
Update manifest (#1882) (@github-actions[bot])
Update CUDA 12.1 to 12.1.1. (#1883) (@maleadt)
Use atomics for allocation statistics. (#1884) (@maleadt)
Fix atomic increment of alloc stats. (#1885) (@maleadt)
Update manifest (#1889) (@github-actions[bot])

Contributors

vtjnash, maleadt, and 5 other contributors

Assets 2

13 Apr 15:31

github-actions

v4.1.4

7e86df8

v4.1.4

CUDA v4.1.4

Diff since v4.1.3

Closed issues:

Buggy precompilation of init-defined symbols can break CUDA_Driver_jll initialization (#1798)
Calling CUDA.set_runtime_version!() with float parameter makes CUDA.jl unusable. (#1831)
Unexpexted memory allocation when using randn! (#1856)
The memory copy speed seems to exceed the hardware limit (#1860)
PCG produces different output on GPU (via Krylov.jl) (#1864)

Merged pull requests:

Fix system_driver_version on platforms not supported by CUDA_Driver_jll. (#1854) (@maleadt)
Update manifest (#1861) (@github-actions[bot])

Contributors

maleadt

Assets 2

31 Mar 16:08

github-actions

v4.1.3

4e8f45b

v4.1.3

CUDA v4.1.3

Diff since v4.1.2

Closed issues:

CUDA.versioninfo() triggers download of lazy artifacts (#1844)

Merged pull requests:

Choose parallel tests based on CPUs, not threads. (#1842) (@maleadt)
Adapt to LLVM.jl 5 and GPUCompiler.jl 0.19. (#1847) (@maleadt)

Contributors

maleadt

Assets 2

29 Mar 08:23

github-actions

v4.1.2

1aa3e6b

v4.1.2

CUDA v4.1.2

Diff since v4.1.1

Closed issues:

Flux's gradient differentiatingrfft leads to non-bit error (#1835)

Merged pull requests:

switch to using defined globals (#1832) (@simonbyrne)
Update manifest (#1837) (@github-actions[bot])

Contributors

simonbyrne

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA v5.0.0

Contributors

CUDA v4.4.1

Contributors

CUDA v4.4.0

Contributors

CUDA v4.3.2

Contributors

CUDA v4.3.1

Contributors

CUDA v4.3.0

Contributors

CUDA v4.2.0

Contributors

CUDA v4.1.4

Contributors

CUDA v4.1.3

Contributors

CUDA v4.1.2

Contributors

Releases: JuliaGPU/CUDA.jl

v5.0.0

CUDA v5.0.0

Contributors

v4.4.1

CUDA v4.4.1

Contributors

v4.4.0

CUDA v4.4.0

Contributors

v4.3.2

CUDA v4.3.2

Contributors

v4.3.1

CUDA v4.3.1

Contributors

v4.3.0

CUDA v4.3.0

Contributors

v4.2.0

CUDA v4.2.0

Contributors

v4.1.4

CUDA v4.1.4

Contributors

v4.1.3

CUDA v4.1.3

Contributors

v4.1.2

CUDA v4.1.2

Contributors