Skip to content

Releases: JuliaGPU/CUDA.jl

v5.0.0

19 Sep 08:39
2fa6572
Compare
Choose a tag to compare

CUDA v5.0.0

Blog post: https://info.juliahub.com/cuda-jl-5-0-changes

This is a breaking release, but the breaking changes are minimal (see the blog post for details):

  • Julia 1.8 is now required, and only CUDA 11.4+ is supported
  • selection of local toolkits has changed slightly

Diff since v4.4.1

Merged pull requests:

Closed issues:

  • StaticArrays.SHermitianCompact not working in kernels in Julia 1.10.0-beta2 (#2069)
  • Support for LinearAlgebra.pinv (#2070)

v4.4.1

25 Aug 20:24
Compare
Choose a tag to compare

CUDA v4.4.1

Diff since v4.4.0

Closed issues:

  • CUDA driver device support does not match toolkit (#70)
  • Launching kernels should not allocate (#66)
  • sync_threads() appears to not be sync'ing threads (#61)
  • Exception when using CuArrays with Flux (#129)
  • Kernel using MVector fails to compile or crashes at runtime due to heap allocation (#45)
  • Performance regression on matrix multiplication between CUDA.jl 1.3.3 and 2.1.0/master (#538)
  • Improve 'VS C++ redistributable' error message (#764)
  • CUSPARSE does not support reductions (#1406)
  • CUDA test failed (#1690)
  • Type constructor in broadcast doesn't compile (#1761)
  • accumulate(+) gives different results for CuArray compared to Array. (#1810)
  • Compat driver: preload all libraries (#1859)
  • Stream synchronization is slow when waiting on the event from CUDA (#1910)
  • cuDNN: Store convolution algorithm choice to disk. (#1947)
  • Disable 'No CUDA-capable device found' error log (#1955)
  • CUDNN_STATUS_NOT_SUPPORTED using 1D CNN model (#1977)
  • Memory allocations during in-place sparse matrix-vector multiplication (#1982)
  • CUSPARSE.sum_dim1 sums the absolute values of elements (#1983)
  • Update to CUDA 12.2 (#1984)
  • unsafe_wrap fails on zero element CuArrays (#1985)
  • rand in kernel works in a deterministic way (#2008)
  • Scalar indexing with CuArray * ReshapedArray{SubArray{CuArray}}} (#2009)
  • volumerhs performance regression (#2010)
  • CuSparseMatrix constructors allocate too much memory? (#2015)
  • Native profiler using CUPTI (#2017)
  • libLLVM-15jl.so (#2018)
  • "symbol multiply defined" error (#2021)
  • Confusion on row major vs column major (#2023)
  • Printing of CuArrays gives zeros or random numbers (#2033)
  • sortperm! fails when output is UInt vector (#2046)
  • Re-introduce spinning loop before nonblocking synchronization (#2057)

Merged pull requests:

v4.4.0

26 Jun 20:29
315c80e
Compare
Choose a tag to compare

CUDA v4.4.0

Diff since v4.3.2

Closed issues:

  • Unreachable control flow leads to illegal divergent barriers (#1746)
  • CUBLAS fails on new CUDA.jl v4 (#1852)
  • Sort fails on Lovelace (sm8.9) GPUs (#1874)
  • gesvd! crashes on Pascal and v12.0 (#1932)
  • No effect for calling "nsys launch" (#1938)
  • Basic math operations with nested adjoint and transpose (#1940)
  • CPU and GPU implementations return results at dissimilar scales, even in double precision arithmetics (#1950)
  • Failed CUDA.jl initialization breaks Flux? (#1952)
  • Recent mul! changes break multiplication with matrices that have StaticArray elements (#1953)
  • Test infrastructure: define test groups (#1961)
  • Strange rand errors when sampling large matrices (#1963)
  • Add aqua tests (#1964)
  • Support of Orin GPU from Nvidia ? (#1966)
  • Crash in LLVM (#1971)
  • Warning cuDNN Convolution (#1972)
  • Strange behaviour when installed at system level (#1973)

Merged pull requests:

v4.3.2

02 Jun 05:55
acd245e
Compare
Choose a tag to compare

CUDA v4.3.2

Diff since v4.3.1

Merged pull requests:

v4.3.1

31 May 19:40
b7420f8
Compare
Choose a tag to compare

CUDA v4.3.1

Diff since v4.3.0

Closed issues:

  • Array testsuite compiles kernel with large types (#1902)
  • CUDA.jl v4 installs CUDA runtime despite version=local (#1922)
  • Occaisonal "CUSOLVERError: an internal operation failed (code 7, CUSOLVER_STATUS_INTERNAL_ERROR)" (#1924)
  • Does [email protected] need [email protected]? (#1929)

Merged pull requests:

v4.3.0

23 May 18:34
d3b1363
Compare
Choose a tag to compare

CUDA v4.3.0

Diff since v4.2.0

Closed issues:

  • Multidimensional reverse (#1126)
  • Test errors on master (#1866)
  • Integer overflow error with svd for large matrix (#1880)
  • Erratic behaviour of CUDA.jl if used in the REPL of VSCode. (#1892)
  • QR decomposition requires scalar indexing (#1893)
  • BSOD during package tests (#1898)
  • Insufficient coverage of CuArrays in the documentation (#1901)
  • Failed to compile with Julia v1.9 on PowerPC (#1911)
  • CUDA test failed in wmma.jl (#1914)
  • Fix deprecation warnings (#1920)

Merged pull requests:

v4.2.0

02 May 13:25
af65a44
Compare
Choose a tag to compare

CUDA v4.2.0

Diff since v4.1.4

Closed issues:

  • NVTX: consider using Start/End for ranges (#1485)
  • Limitations of CuIterator (#1768)
  • Testing fails on unsupported devices. (#1815)
  • Local runtime discovery does not work for external libraries (CUDNN, CUTENSOR) (#1850)
  • Passing tests using Github CI workflow errors with libcuda not defined (#1867)
  • Cannot precompile GPU code with SnoopPrecompile (#1870)
  • Incorrect kernel execution with bounds checking using Julia 1.9.0-rc2 (#1875)
  • Fake CUDA library (#1879)
  • Error thrown when launching Julia with Nsight systems or compute. (#1886)
  • Cannot construct CuDeviceArray (#1887)
  • Incorrect colVal array when using CuSparseMatrixCSR command on sparse matrix (#1888)

Merged pull requests:

  • Use adapt symmetrically in CuIterator (#1769) (@mcabbott)
  • Allow but warn when testing on not fully-supported devices. (#1818) (@maleadt)
  • Support runtime discovery for non-toolkit libraries (CUTENSOR, CUDNN, CUQUANTUM) (#1858) (@mloubout)
  • Add KernelAbstractions.jl unsafe_free! (#1863) (@pxl-th)
  • Allow precompiling CUDA code. (#1865) (@maleadt)
  • Assert CUDA.jl is functional when creating the TLS. (#1868) (@maleadt)
  • Update manifest (#1871) (@github-actions[bot])
  • Don't collect AbstractQ objects in tests (#1872) (@dkarrasch)
  • Add compatibility entry for Lovelace (#1873) (@xaellison)
  • remove some type-piracy from cusparse (#1876) (@vtjnash)
  • Remove more unneeded ndims methods. (#1878) (@maleadt)
  • Guard the initialization-time CUDA driver check in a try/catch. (#1881) (@maleadt)
  • Update manifest (#1882) (@github-actions[bot])
  • Update CUDA 12.1 to 12.1.1. (#1883) (@maleadt)
  • Use atomics for allocation statistics. (#1884) (@maleadt)
  • Fix atomic increment of alloc stats. (#1885) (@maleadt)
  • Update manifest (#1889) (@github-actions[bot])

v4.1.4

13 Apr 15:31
7e86df8
Compare
Choose a tag to compare

CUDA v4.1.4

Diff since v4.1.3

Closed issues:

  • Buggy precompilation of init-defined symbols can break CUDA_Driver_jll initialization (#1798)
  • Calling CUDA.set_runtime_version!() with float parameter makes CUDA.jl unusable. (#1831)
  • Unexpexted memory allocation when using randn! (#1856)
  • The memory copy speed seems to exceed the hardware limit (#1860)
  • PCG produces different output on GPU (via Krylov.jl) (#1864)

Merged pull requests:

  • Fix system_driver_version on platforms not supported by CUDA_Driver_jll. (#1854) (@maleadt)
  • Update manifest (#1861) (@github-actions[bot])

v4.1.3

31 Mar 16:08
4e8f45b
Compare
Choose a tag to compare

CUDA v4.1.3

Diff since v4.1.2

Closed issues:

  • CUDA.versioninfo() triggers download of lazy artifacts (#1844)

Merged pull requests:

  • Choose parallel tests based on CPUs, not threads. (#1842) (@maleadt)
  • Adapt to LLVM.jl 5 and GPUCompiler.jl 0.19. (#1847) (@maleadt)

v4.1.2

29 Mar 08:23
1aa3e6b
Compare
Choose a tag to compare

CUDA v4.1.2

Diff since v4.1.1

Closed issues:

  • Flux's gradient differentiatingrfft leads to non-bit error (#1835)

Merged pull requests: