Releases · JuliaGPU/CUDA.jl

ERROR: LoadError: bin\cublas64_11.dll when installing CUDA (#1750)
System-wide CUDA in LD_LIBRARY_PATH breaks CUBLAS (#1755)
CuDeviceTexture getindex breaks when executed on the CPU (#1757)
cuDNN.version can cause Julia to crash, missing cudnn_ops_infer64_8.dll (#1777)
cuDNN compile error "ERROR: LoadError: ArgumentError: invalid version string: local" (#1783)
"Error: No CUDA Runtime library found" for ≥v4.0.0 (#1808)
sqrt broken in kernels 'Format of __nvvm__reflect function not recognized' (#1817)

Merged pull requests:

Add support for CUDA 12.0. (#1742) (@maleadt)
Add more fixes and tests for CUDA toolkit 12.0 (#1756) (@amontoison)
Update manifest (#1758) (@github-actions[bot])
Fix test/cusparse/interfaces.jl (#1762) (@amontoison)
Simplify the function sig. (#1763) (@N5N3)
Update manifest (#1770) (@github-actions[bot])
Make versioninfo() resilient against NVML EPERM. (#1771) (@maleadt)
Move CUDAKernels to CUDA.jl (#1772) (@vchuravy)
[CUSPARSE] Improve conversion and tests between sparse matrices (#1774) (@amontoison)
Use geam for + and - operations with CuMatrix{<:CublasFloat} (#1775) (@amontoison)
Update manifest (#1776) (@github-actions[bot])
Update manifest (#1781) (@github-actions[bot])
Update manifest (#1784) (@github-actions[bot])
[CUSPARSE] Update preconditioners.jl (#1785) (@amontoison)
[CUSOLVER] Avoid the conversion to CSR format for reordering routines (#1786) (@amontoison)
Bump GPUCompiler. (#1787) (@maleadt)
Remove unneeded variable. (#1788) (@maleadt)
[CUSPARSE] Update conversions.jl (#1791) (@amontoison)
Update to CUDNN 8.8.1 for CUDA 12 compatibility. (#1792) (@maleadt)
Add support for CUDA 12.1 (#1793) (@maleadt)
[CUSPARSE] Interface color reordering (#1794) (@amontoison)
[CUSPARSE] Interface gtsv2 (#1795) (@amontoison)
Update manifest (#1796) (@github-actions[bot])
Adapt to GPUCompiler 0.18 (#1799) (@maleadt)
Follow Array's behavior when initializing (#1800) (@lcw)
[CUSOLVER] Support A \ b for rectangular matrices (#1802) (@amontoison)
Use symbols instead of values when emitting code, when possible. (#1804) (@maleadt)
Refactor CI pipeline a little. (#1805) (@maleadt)
[CUSOLVER] Improve the dispatch for LAPACK routines (#1806) (@amontoison)
Diagonal for lower triangular of LU decomposition set incorrectly (#1813) (@tgymnich)
CompatHelper: add new compat entry for "KernelAbstractions" at version "0.9" (#1824) (@github-actions[bot])
Rebuild CUPTI API with support for STRUCT_SIZE (#1827) (@vchuravy)
Release CUDA 4.1 (#1828) (@vchuravy)

Contributors

lcw, vchuravy, and 4 other contributors

Assets 2

09 Feb 19:14

vchuravy

v4.0.1

8cd13d6

v4.0.1

What's Changed

Warn when using old devices by @maleadt in #1752
Silence some errors to support conditional use. by @maleadt in #1754

Full Changelog: v4.0.0...v4.0.1

Contributors

maleadt

Assets 2

01 Feb 09:05

github-actions

v4.0.0

f85dd7b

v4.0.0

CUDA v4.0.0

Diff since v3.13.1

Closed issues:

Missing implementation of right multiply for QR decomposition (#1738)
[CUSPARSE] Type error with mm! (#1743)

Merged pull requests:

Implement rmul for qr. (#1739) (@maleadt)
Update manifest (#1741) (@github-actions[bot])
Update CUSPARSE for CUDA v12.0 (#1744) (@amontoison)
Fix nvprof command (#1745) (@lucifer1004)
Update manifest (#1747) (@github-actions[bot])
Fix grammar (#1748) (@lucifer1004)

Contributors

maleadt, lucifer1004, and amontoison

Assets 2

20 Jan 15:53

github-actions

v3.13.1

459b176

v3.13.1

CUDA v3.13.1

Diff since v3.13.0

Closed issues:

CUDA.jl cuFFT underperforming against CuPy cuFFT (#1682)
Is block-spmm supported？ (#1736)

Merged pull requests:

Introduce cuFFT plan cache; switch to auto-managed memory. (#1734) (@maleadt)
Stop pirating GPUArrays' RNG methods. (#1735) (@maleadt)

Contributors

maleadt

Assets 2

20 Jan 14:40

github-actions

v3.12.2

6b12ece

v3.12.2

CUDA v3.12.2

Diff since v3.12.1

Closed issues:

CUDA.jl cuFFT underperforming against CuPy cuFFT (#1682)
Error during CUDA test (#1718)
Kernel error from bad broadcast (should be regular error?) (#1720)
Freeze into StackOverflow when JULIA_DEBUG=CUDA set (#1721)
Use of linear operators in CUDA.jl (#1727)
Is block-spmm supported？ (#1736)

Merged pull requests:

Allow copy(::RNG) (#1719) (@mcabbott)
Update manifest (#1722) (@github-actions[bot])
Simplify CuError rendering before library initialization. (#1723) (@maleadt)
Simplify CuError rendering before library initialization (master branch version) (#1724) (@maleadt)
Make device RNG test more robust. (#1725) (@maleadt)
Rely on LLVM.jl's typed_ccall for more intrinsics. (#1728) (@maleadt)
Backports for 3.13 (#1729) (@maleadt)
Simplify CUBLAS and CUSPARSE wrappers, reducing code generated. (#1730) (@maleadt)
Add Julia 1.9 CI. (#1731) (@maleadt)
Use released dependencies. (#1732) (@maleadt)
Remove NVTX. (#1733) (@maleadt)
Introduce cuFFT plan cache; switch to auto-managed memory. (#1734) (@maleadt)
Stop pirating GPUArrays' RNG methods. (#1735) (@maleadt)

Contributors

maleadt and mcabbott

Assets 2

19 Jan 17:00

github-actions

v3.13.0

1a52af1

v3.13.0

CUDA v3.13.0

Diff since v3.12.1

Closed issues:

Error during CUDA test (#1718)
Kernel error from bad broadcast (should be regular error?) (#1720)
Freeze into StackOverflow when JULIA_DEBUG=CUDA set (#1721)
Use of linear operators in CUDA.jl (#1727)

Merged pull requests:

Allow copy(::RNG) (#1719) (@mcabbott)
Update manifest (#1722) (@github-actions[bot])
Simplify CuError rendering before library initialization. (#1723) (@maleadt)
Simplify CuError rendering before library initialization (master branch version) (#1724) (@maleadt)
Make device RNG test more robust. (#1725) (@maleadt)
Rely on LLVM.jl's typed_ccall for more intrinsics. (#1728) (@maleadt)
Backports for 3.13 (#1729) (@maleadt)
Simplify CUBLAS and CUSPARSE wrappers, reducing code generated. (#1730) (@maleadt)
Add Julia 1.9 CI. (#1731) (@maleadt)
Use released dependencies. (#1732) (@maleadt)
Remove NVTX. (#1733) (@maleadt)

Contributors

maleadt and mcabbott

Assets 2

06 Jan 07:34

github-actions

v3.12.1

b7bdc79

v3.12.1

CUDA v3.12.1

Diff since v3.12.0

Closed issues:

Accumulate doesn't work on >=4 dim Arrays with dims <= ndims(A) - 3 (#1039)
CUSPARSE does not support dense-sparse matrix multiplication (#1403)
Scalar indexing when comparing a CuArray to the identity matrix (#1557)
CUBLAS_STATUS_NOT_INITIALIZED (#1567)
LinearAlgebra./ and LinearAlgebra.\ breaks CuArray (#1568)
Window size in grid-stride loop (#1573)
Matrix multiplication works for primitive and non-primitive custom number types on the CPU, but it fails for primitive custom number types on the GPU. (#1574)
CuIterator doesn't specify IteratorSize but has no length() (#1583)
Garbage collection doesn't work as shown in the documentation (#1586)
Adding sparse adjoint results in kernel error (#1591)
sparse - sparse matrix multiplication partially missing (#1599)
FastMath sincos(), cis(), exp(im..) aren't as fast as C++ (#1606)
wrong type in wrapper of a cusolver function (#1621)
Adding CUDNN support for 3D convolutions/cross-correlations (#1631)
copyto! does not work between a CuArray and a view(Array) (#1634)
Minor issue with sparse function (#1641)
Scalar indexing when displaying Diagonal{Int64, CuSparseVector{Int64, Int32}} (#1645)
Many errors running test suite on GTX 960 4GB (#1650)
Driver discovery broken on platforms without compat driver (#1653)
Aliasing/Polluted Result from rfftplan for Float32 2^n 3D array (#1656)
Re-instate memory limit (#1670)
Split libnvToolsExt from CUDA_Runtime_jll? (#1672)
accumulate(op, a) causes scalar indexing (#1680)
CUSPARSE CI failures (#1692)
axpy! for nested base types (reshapedarray/adjoint/view) (#1696)
copyto! between a PermutedDimsArray view and a CuArray doesn't work (#1697)
WMMA test failure (#1700)
UndefVarError when a binary is not found (#1701)
Is CUSPARSELT supported? (#1702)
Best practices to reduce startup time (#1707)
1.9 compatibility (#1710)
WARNING: unused variadic paramters. (#1712)

Merged pull requests:

Remove/rework CuDeviceArray constructors (#1308) (@maleadt)
Add always_inline kernel parameter (#1554) (@lcw)
Update manifest (#1564) (@github-actions[bot])
Update manifest (#1569) (@github-actions[bot])
Update manifest (#1571) (@github-actions[bot])
Fix native RNG window calculation. (#1575) (@maleadt)
Use Base.active_project. (#1576) (@maleadt)
Fixes for and tests using JET. (#1577) (@maleadt)
Update manifest (#1578) (@github-actions[bot])
Docs, remove global variables in intro benchmark (#1580) (@SteffenPL)
Update manifest (#1581) (@github-actions[bot])
Update manifest (#1582) (@github-actions[bot])
Bugfixes when using \ operator with non square matrices (#1584) (@GVigne)
remove unbound type parameters (#1585) (@nsajko)
added --openacc-profiling off to the nvprof (#1587) (@mbeltagy)
Update manifest (#1588) (@github-actions[bot])
Wrap at-cuda's code in a let block. (#1589) (@maleadt)
Revert: Use JET during test suite. (#1590) (@maleadt)
[CUSPARSE] Update mv! and mm! functions for CuSparseMatrixCOO and CuSparseMatrixCSC (#1592) (@amontoison)
[CUSPARSE] Add sv! and sm! routines (#1593) (@amontoison)
CompatHelper: bump compat for "BFloat16s" to "0.3" (#1594) (@github-actions[bot])
Update wrap.jl (#1595) (@amontoison)
Provide more useful explanation why an eltype is unsupported. (#1596) (@maleadt)
CompatHelper: bump compat for "BFloat16s" to "0.4" (#1597) (@github-actions[bot])
Improve eltype error reporting. (#1598) (@maleadt)
Add () at the end of the library name in all ccall (#1600) (@amontoison)
Define length for CuIterator (#1602) (@mcabbott)
Added more sparse functions like: kron, tril, triu, reshape, adjoint, transpose, sparse-sparse multiplication (#1603) (@albertomercurio)
Fix rotate! and reflect! for the generic fallback in GPUArrays.jl (#1604) (@amontoison)
Update manifest (#1605) (@github-actions[bot])
Update manifest (#1609) (@github-actions[bot])
[CUSPARSE] Interface generic routines (#1611) (@amontoison)
[CUSPARSE] Update sparse-sparse GEMM (#1613) (@amontoison)
[CUSPARSE] Add sddmm! and gemvi! routines (#1615) (@amontoison)
Update manifest (#1616) (@github-actions[bot])
Don't use isbitsunion to support structs of union types. (#1617) (@maleadt)
Update CUDA driver compatibility package to 11.8. (#1618) (@maleadt)
Update CUDA artifacts to 11.7 Update 1. (#1619) (@maleadt)
Update to CUDA 11.8 (#1620) (@maleadt)
Update to CUDNN 8.6. (#1622) (@maleadt)
Move CUDNN and CUTENSOR into separate packages (#1624) (@maleadt)
Bump BFloat16s. (#1625) (@maleadt)
fix #1621 (#1626) (@jemiryguo)
Restore functionality of FastMath.sincos. (#1627) (@maleadt)
Update manifest (#1628) (@github-actions[bot])
Switch from manual artifact handling to automated JLLs (#1629) (@maleadt)
[CUSPARSE] Add CuMatrix * CuSparseMatrix products (#1632) (@amontoison)
Silence some test warnings. (#1635) (@maleadt)
Update CUTENSOR to v1.6 (#1636) (@maleadt)
[CUSPARSE] Add SparseMatrix * SparseVector products (#1637) (@amontoison)
Upgrade CUSTATEVEC to v1.1 (#1638) (@maleadt)
Upgrade CUTENSORNET to v1.1 (#1639) (@maleadt)
[CUSPARSE] Add CuSparseVector ± CuSparseVector (#1640) (@amontoison)
CompatHelper: add new compat entry for "Preferences" at version "1" (#1642) (@github-actions[bot])
Fix #1641 (#1643) (@amontoison)
Update manifest (#1646) (@github-actions[bot])
[CUSPARSE] Add dot(CuSparseVector,CuVector) and vice-versa (#1647) (@amontoison)
[CUSPARSE] Add ldiv! for CuSparseMatrixCOO and geam for CuSparseMatrixCSC (#1648) (@amontoison)
Update autogenerated headers (#1649) (@maleadt)
Remove deprecations (#1651) (@maleadt)
Don't warn about the old JULIA_CUDA_USE_BINARYBUILDER env var when using preferences (#1652) (@maleadt)
Update CUTENSORNET to use new slice group (#1654) (@kshyatt)
[CUSPARSE] Fix conversions between CuSparseMatrixCOO and CuSparseMatrixCSC (#1655) (@amontoison)
Include compiler options in error log. (#1657) (@maleadt)
Discover the system driver when CUDA_Driver_jll isn't available. (#1658) (@maleadt)
Preserve buffer type when adapting to CuArray. (#1659) (@maleadt)
Update manifest (#1661) (@github-actions[bot])
Extend conversion of QRPackedQ object to CuArray (#1662) (@GVigne)
[CUSPARSE] Add CuSparseMatrixCSC * CuSparseMatrixCSC (#1663) (@amontoison)
Update manifest (#1665) (@github-actions[bot])
[CUSPARSE] Add more tests (#1668) (@amontoison)
Update manifest (#1671) (@github-actions[bot])
Update manifest (#1676) (@github-actions[bot])
Fix eigen when using Hermitian or Symmetric matrices (#1677) (@GVigne)
Update manifest (#1679) (@github-actions[bot])
adding defaults for accumulate(op, a) with modified code from Base.accumulate (#1681) (@leios)
Add right division operator for Diagonal matrices (#1683) (@GVigne)
Update manifest (#1686) (@github-actions[bot])
Bump CUQUANTUM libraries (#1688) (@maleadt)
typo (#1689) (@ArnoStrouwen)
Retry CUSOLVER handle creation when encountering an internal error. (#1691) (@maleadt)
Fix #1692 (#1693) (@amontoison)
Update manifest (#1694) (@github-actions[bot])
[CUSPARSE] Support kron with Diagonal arguments (#1695) (@albertomercurio)
Re-introduce memory limits. (#1698) (@maleadt)
Adapt to GPUCompiler changes. (#1699) (@maleadt)
WMMA: Don't wrap fragments of size 1 in a struct. (#1704) (@maleadt)
Update manifest (#1708) (@github-actions[bot])
Use plain llvmcall calling convention for WMMA intrinsics. (#1709) (@maleadt)
Reclaim in cuDNN conv algorithm search (#1711) (@ToucheSir)
CUBLAS: test against generic axp(b)y, not the BLAS-specific one. (#1713) (@maleadt)
Fix LU getproperty invoke. (#1714) (@maleadt)
Backports for 3.12.1 (#1715) (@maleadt)
Specialize cholcopy to avoid scalar indexing. (#1716) (@maleadt)
Fix handling of inline-allocated structures with unions. (#1717) (@maleadt)

Contributors

lcw, maleadt, and 12 other contributors

Assets 2

16 Jul 21:40

github-actions

v3.12.0

3729010

v3.12.0

CUDA v3.12.0

Diff since v3.11.0

Closed issues:

Implement Base.repeat (#177)
repeat performs scalar indexing for multi-dimensional arrays (#1051)
The GPU compiler fails on a call to maximum (#1548)
versioninfo triggers artifact downloads (#1549)
Error when broadcasting composed functions (#1550)
overload Base.copy! for AbstractGPUArray{<:Any,1} (#1555)

Merged pull requests:

Fix math quirk. (#1546) (@maleadt)
Wrap cusolverRf.h and cusolverSp_LOWLEVEL_PREVIEW.h (#1547) (@frapac)
Update manifest (#1551) (@github-actions[bot])
tighten unsafe_wrap signature on scalar length (#1552) (@sjkelly)
Update Documenter key. (#1553) (@maleadt)
Update manifest (#1556) (@github-actions[bot])
Import factorisation internal types from LinearAlgebra (#1558) (@theabhirath)
Update manifest (#1560) (@github-actions[bot])
add reshape for CuDeviceArray (#1561) (@omlins)

Contributors

maleadt, sjkelly, and 3 other contributors

Assets 2

15 Jun 10:29

github-actions

v3.11.0

15a0e1d

v3.11.0

CUDA v3.11.0

Diff since v3.10.1

Closed issues:

CUSPARSE: Diagonal + CSC/CSR gives dense array (#1469)
CUBLAS: Multiplication of UpperTriangular/LowerTriangular not supported (#1486)
CUTENSOR tests consume lots of memory, breaking other tests (#1501)
CUFFT doesn't work for ComplexF64 C2C in-place (#1519)
Inconsistency of == and isequal for CuArray (#1524)
Setting CUDA seed the first time changes Random's RNG non-deterministically (#1526)
Undefined exported symbols (#1527)
Could not load library libLLVMExtra-14.dll (#1535)
Add an rrule for cholesky to CUDA.jl (#1541)

Merged pull requests:

specialize +/- op for sparse diag (#1514) (@Roger-luo)
Make sure instantiating RNGs doesn't affect the global CPU RNG. (#1530) (@maleadt)
Update manifest (#1531) (@github-actions[bot])
ldiv! for LU Decomposition (#1532) (@SBuercklin)
Lower dmax for contraction tests (#1534) (@kshyatt)
Fix convolution algorithm search (#1536) (@maxfreu)
Update manifest (#1537) (@github-actions[bot])
add specializations for some triangular-triangular multiplications (#1538) (@Red-Portal)
Add a utility to download artifacts without a functional driver. (#1539) (@maleadt)
Update manifest (#1543) (@github-actions[bot])
Explicit tests for type conversion (#1544) (@kshyatt)
Remove unused exports. (#1545) (@maleadt)

Contributors

maleadt, kshyatt, and 4 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA v4.1.1

Contributors

CUDA v4.1.0

Contributors

What's Changed

Contributors

CUDA v4.0.0

Contributors

CUDA v3.13.1

Contributors

CUDA v3.12.2

Contributors

CUDA v3.13.0

Contributors

CUDA v3.12.1

Contributors

CUDA v3.12.0

Contributors

CUDA v3.11.0

Contributors

Releases: JuliaGPU/CUDA.jl

v4.1.1

CUDA v4.1.1

Contributors

v4.1.0

CUDA v4.1.0

Contributors

v4.0.1

What's Changed

Contributors

v4.0.0

CUDA v4.0.0

Contributors

v3.13.1

CUDA v3.13.1

Contributors

v3.12.2

CUDA v3.12.2

Contributors

v3.13.0

CUDA v3.13.0

Contributors

v3.12.1

CUDA v3.12.1

Contributors

v3.12.0

CUDA v3.12.0

Contributors

v3.11.0

CUDA v3.11.0

Contributors