Releases: JuliaGPU/CUDA.jl
Releases · JuliaGPU/CUDA.jl
v4.1.1
v4.1.0
CUDA v4.1.0
Closed issues:
- ERROR: LoadError: bin\cublas64_11.dll when installing CUDA (#1750)
- System-wide CUDA in LD_LIBRARY_PATH breaks CUBLAS (#1755)
- CuDeviceTexture getindex breaks when executed on the CPU (#1757)
- cuDNN.version can cause Julia to crash, missing
cudnn_ops_infer64_8.dll
(#1777) - cuDNN compile error "ERROR: LoadError: ArgumentError: invalid version string: local" (#1783)
- "Error: No CUDA Runtime library found" for ≥v4.0.0 (#1808)
- sqrt broken in kernels 'Format of __nvvm__reflect function not recognized' (#1817)
Merged pull requests:
- Add support for CUDA 12.0. (#1742) (@maleadt)
- Add more fixes and tests for CUDA toolkit 12.0 (#1756) (@amontoison)
- Update manifest (#1758) (@github-actions[bot])
- Fix test/cusparse/interfaces.jl (#1762) (@amontoison)
- Simplify the function sig. (#1763) (@N5N3)
- Update manifest (#1770) (@github-actions[bot])
- Make versioninfo() resilient against NVML EPERM. (#1771) (@maleadt)
- Move CUDAKernels to CUDA.jl (#1772) (@vchuravy)
- [CUSPARSE] Improve conversion and tests between sparse matrices (#1774) (@amontoison)
- Use geam for + and - operations with CuMatrix{<:CublasFloat} (#1775) (@amontoison)
- Update manifest (#1776) (@github-actions[bot])
- Update manifest (#1781) (@github-actions[bot])
- Update manifest (#1784) (@github-actions[bot])
- [CUSPARSE] Update preconditioners.jl (#1785) (@amontoison)
- [CUSOLVER] Avoid the conversion to CSR format for reordering routines (#1786) (@amontoison)
- Bump GPUCompiler. (#1787) (@maleadt)
- Remove unneeded variable. (#1788) (@maleadt)
- [CUSPARSE] Update conversions.jl (#1791) (@amontoison)
- Update to CUDNN 8.8.1 for CUDA 12 compatibility. (#1792) (@maleadt)
- Add support for CUDA 12.1 (#1793) (@maleadt)
- [CUSPARSE] Interface color reordering (#1794) (@amontoison)
- [CUSPARSE] Interface gtsv2 (#1795) (@amontoison)
- Update manifest (#1796) (@github-actions[bot])
- Adapt to GPUCompiler 0.18 (#1799) (@maleadt)
- Follow
Array
's behavior when initializing (#1800) (@lcw) - [CUSOLVER] Support A \ b for rectangular matrices (#1802) (@amontoison)
- Use symbols instead of values when emitting code, when possible. (#1804) (@maleadt)
- Refactor CI pipeline a little. (#1805) (@maleadt)
- [CUSOLVER] Improve the dispatch for LAPACK routines (#1806) (@amontoison)
- Diagonal for lower triangular of LU decomposition set incorrectly (#1813) (@tgymnich)
- CompatHelper: add new compat entry for "KernelAbstractions" at version "0.9" (#1824) (@github-actions[bot])
- Rebuild CUPTI API with support for STRUCT_SIZE (#1827) (@vchuravy)
- Release CUDA 4.1 (#1828) (@vchuravy)
v4.0.1
v4.0.0
CUDA v4.0.0
Closed issues:
- Missing implementation of right multiply for QR decomposition (#1738)
- [CUSPARSE] Type error with mm! (#1743)
Merged pull requests:
- Implement rmul for qr. (#1739) (@maleadt)
- Update manifest (#1741) (@github-actions[bot])
- Update CUSPARSE for CUDA v12.0 (#1744) (@amontoison)
- Fix nvprof command (#1745) (@lucifer1004)
- Update manifest (#1747) (@github-actions[bot])
- Fix grammar (#1748) (@lucifer1004)
v3.13.1
v3.12.2
CUDA v3.12.2
Closed issues:
- CUDA.jl cuFFT underperforming against CuPy cuFFT (#1682)
- Error during CUDA test (#1718)
- Kernel error from bad broadcast (should be regular error?) (#1720)
- Freeze into StackOverflow when
JULIA_DEBUG=CUDA
set (#1721) - Use of linear operators in CUDA.jl (#1727)
- Is block-spmm supported? (#1736)
Merged pull requests:
- Allow
copy(::RNG)
(#1719) (@mcabbott) - Update manifest (#1722) (@github-actions[bot])
- Simplify CuError rendering before library initialization. (#1723) (@maleadt)
- Simplify CuError rendering before library initialization (master branch version) (#1724) (@maleadt)
- Make device RNG test more robust. (#1725) (@maleadt)
- Rely on LLVM.jl's typed_ccall for more intrinsics. (#1728) (@maleadt)
- Backports for 3.13 (#1729) (@maleadt)
- Simplify CUBLAS and CUSPARSE wrappers, reducing code generated. (#1730) (@maleadt)
- Add Julia 1.9 CI. (#1731) (@maleadt)
- Use released dependencies. (#1732) (@maleadt)
- Remove NVTX. (#1733) (@maleadt)
- Introduce cuFFT plan cache; switch to auto-managed memory. (#1734) (@maleadt)
- Stop pirating GPUArrays' RNG methods. (#1735) (@maleadt)
v3.13.0
CUDA v3.13.0
Closed issues:
- Error during CUDA test (#1718)
- Kernel error from bad broadcast (should be regular error?) (#1720)
- Freeze into StackOverflow when
JULIA_DEBUG=CUDA
set (#1721) - Use of linear operators in CUDA.jl (#1727)
Merged pull requests:
- Allow
copy(::RNG)
(#1719) (@mcabbott) - Update manifest (#1722) (@github-actions[bot])
- Simplify CuError rendering before library initialization. (#1723) (@maleadt)
- Simplify CuError rendering before library initialization (master branch version) (#1724) (@maleadt)
- Make device RNG test more robust. (#1725) (@maleadt)
- Rely on LLVM.jl's typed_ccall for more intrinsics. (#1728) (@maleadt)
- Backports for 3.13 (#1729) (@maleadt)
- Simplify CUBLAS and CUSPARSE wrappers, reducing code generated. (#1730) (@maleadt)
- Add Julia 1.9 CI. (#1731) (@maleadt)
- Use released dependencies. (#1732) (@maleadt)
- Remove NVTX. (#1733) (@maleadt)
v3.12.1
CUDA v3.12.1
Closed issues:
- Accumulate doesn't work on >=4 dim Arrays with dims <= ndims(A) - 3 (#1039)
- CUSPARSE does not support dense-sparse matrix multiplication (#1403)
- Scalar indexing when comparing a CuArray to the identity matrix (#1557)
- CUBLAS_STATUS_NOT_INITIALIZED (#1567)
- LinearAlgebra./ and LinearAlgebra.\ breaks CuArray (#1568)
- Window size in grid-stride loop (#1573)
- Matrix multiplication works for primitive and non-primitive custom number types on the CPU, but it fails for primitive custom number types on the GPU. (#1574)
- CuIterator doesn't specify IteratorSize but has no length() (#1583)
- Garbage collection doesn't work as shown in the documentation (#1586)
- Adding sparse adjoint results in kernel error (#1591)
- sparse - sparse matrix multiplication partially missing (#1599)
- FastMath sincos(), cis(), exp(im..) aren't as fast as C++ (#1606)
- wrong type in wrapper of a cusolver function (#1621)
- Adding CUDNN support for 3D convolutions/cross-correlations (#1631)
copyto!
does not work between a CuArray and aview(Array)
(#1634)- Minor issue with sparse function (#1641)
- Scalar indexing when displaying Diagonal{Int64, CuSparseVector{Int64, Int32}} (#1645)
- Many errors running test suite on GTX 960 4GB (#1650)
- Driver discovery broken on platforms without compat driver (#1653)
- Aliasing/Polluted Result from rfftplan for Float32 2^n 3D array (#1656)
- Re-instate memory limit (#1670)
- Split libnvToolsExt from CUDA_Runtime_jll? (#1672)
accumulate(op, a)
causes scalar indexing (#1680)- CUSPARSE CI failures (#1692)
- axpy! for nested base types (reshapedarray/adjoint/view) (#1696)
copyto!
between a PermutedDimsArray view and a CuArray doesn't work (#1697)- WMMA test failure (#1700)
UndefVarError
when a binary is not found (#1701)- Is CUSPARSELT supported? (#1702)
- Best practices to reduce startup time (#1707)
- 1.9 compatibility (#1710)
- WARNING: unused variadic paramters. (#1712)
Merged pull requests:
- Remove/rework CuDeviceArray constructors (#1308) (@maleadt)
- Add
always_inline
kernel parameter (#1554) (@lcw) - Update manifest (#1564) (@github-actions[bot])
- Update manifest (#1569) (@github-actions[bot])
- Update manifest (#1571) (@github-actions[bot])
- Fix native RNG window calculation. (#1575) (@maleadt)
- Use Base.active_project. (#1576) (@maleadt)
- Fixes for and tests using JET. (#1577) (@maleadt)
- Update manifest (#1578) (@github-actions[bot])
- Docs, remove global variables in intro benchmark (#1580) (@SteffenPL)
- Update manifest (#1581) (@github-actions[bot])
- Update manifest (#1582) (@github-actions[bot])
- Bugfixes when using \ operator with non square matrices (#1584) (@GVigne)
- remove unbound type parameters (#1585) (@nsajko)
- added --openacc-profiling off to the nvprof (#1587) (@mbeltagy)
- Update manifest (#1588) (@github-actions[bot])
- Wrap at-cuda's code in a let block. (#1589) (@maleadt)
- Revert: Use JET during test suite. (#1590) (@maleadt)
- [CUSPARSE] Update mv! and mm! functions for CuSparseMatrixCOO and CuSparseMatrixCSC (#1592) (@amontoison)
- [CUSPARSE] Add sv! and sm! routines (#1593) (@amontoison)
- CompatHelper: bump compat for "BFloat16s" to "0.3" (#1594) (@github-actions[bot])
- Update wrap.jl (#1595) (@amontoison)
- Provide more useful explanation why an eltype is unsupported. (#1596) (@maleadt)
- CompatHelper: bump compat for "BFloat16s" to "0.4" (#1597) (@github-actions[bot])
- Improve eltype error reporting. (#1598) (@maleadt)
- Add () at the end of the library name in all ccall (#1600) (@amontoison)
- Define length for CuIterator (#1602) (@mcabbott)
- Added more sparse functions like: kron, tril, triu, reshape, adjoint, transpose, sparse-sparse multiplication (#1603) (@albertomercurio)
- Fix rotate! and reflect! for the generic fallback in GPUArrays.jl (#1604) (@amontoison)
- Update manifest (#1605) (@github-actions[bot])
- Update manifest (#1609) (@github-actions[bot])
- [CUSPARSE] Interface generic routines (#1611) (@amontoison)
- [CUSPARSE] Update sparse-sparse GEMM (#1613) (@amontoison)
- [CUSPARSE] Add sddmm! and gemvi! routines (#1615) (@amontoison)
- Update manifest (#1616) (@github-actions[bot])
- Don't use isbitsunion to support structs of union types. (#1617) (@maleadt)
- Update CUDA driver compatibility package to 11.8. (#1618) (@maleadt)
- Update CUDA artifacts to 11.7 Update 1. (#1619) (@maleadt)
- Update to CUDA 11.8 (#1620) (@maleadt)
- Update to CUDNN 8.6. (#1622) (@maleadt)
- Move CUDNN and CUTENSOR into separate packages (#1624) (@maleadt)
- Bump BFloat16s. (#1625) (@maleadt)
- fix #1621 (#1626) (@jemiryguo)
- Restore functionality of FastMath.sincos. (#1627) (@maleadt)
- Update manifest (#1628) (@github-actions[bot])
- Switch from manual artifact handling to automated JLLs (#1629) (@maleadt)
- [CUSPARSE] Add CuMatrix * CuSparseMatrix products (#1632) (@amontoison)
- Silence some test warnings. (#1635) (@maleadt)
- Update CUTENSOR to v1.6 (#1636) (@maleadt)
- [CUSPARSE] Add SparseMatrix * SparseVector products (#1637) (@amontoison)
- Upgrade CUSTATEVEC to v1.1 (#1638) (@maleadt)
- Upgrade CUTENSORNET to v1.1 (#1639) (@maleadt)
- [CUSPARSE] Add CuSparseVector ± CuSparseVector (#1640) (@amontoison)
- CompatHelper: add new compat entry for "Preferences" at version "1" (#1642) (@github-actions[bot])
- Fix #1641 (#1643) (@amontoison)
- Update manifest (#1646) (@github-actions[bot])
- [CUSPARSE] Add dot(CuSparseVector,CuVector) and vice-versa (#1647) (@amontoison)
- [CUSPARSE] Add ldiv! for CuSparseMatrixCOO and geam for CuSparseMatrixCSC (#1648) (@amontoison)
- Update autogenerated headers (#1649) (@maleadt)
- Remove deprecations (#1651) (@maleadt)
- Don't warn about the old JULIA_CUDA_USE_BINARYBUILDER env var when using preferences (#1652) (@maleadt)
- Update CUTENSORNET to use new slice group (#1654) (@kshyatt)
- [CUSPARSE] Fix conversions between CuSparseMatrixCOO and CuSparseMatrixCSC (#1655) (@amontoison)
- Include compiler options in error log. (#1657) (@maleadt)
- Discover the system driver when CUDA_Driver_jll isn't available. (#1658) (@maleadt)
- Preserve buffer type when adapting to CuArray. (#1659) (@maleadt)
- Update manifest (#1661) (@github-actions[bot])
- Extend conversion of QRPackedQ object to CuArray (#1662) (@GVigne)
- [CUSPARSE] Add CuSparseMatrixCSC * CuSparseMatrixCSC (#1663) (@amontoison)
- Update manifest (#1665) (@github-actions[bot])
- [CUSPARSE] Add more tests (#1668) (@amontoison)
- Update manifest (#1671) (@github-actions[bot])
- Update manifest (#1676) (@github-actions[bot])
- Fix eigen when using Hermitian or Symmetric matrices (#1677) (@GVigne)
- Update manifest (#1679) (@github-actions[bot])
- adding defaults for accumulate(op, a) with modified code from Base.accumulate (#1681) (@leios)
- Add right division operator for Diagonal matrices (#1683) (@GVigne)
- Update manifest (#1686) (@github-actions[bot])
- Bump CUQUANTUM libraries (#1688) (@maleadt)
- typo (#1689) (@ArnoStrouwen)
- Retry CUSOLVER handle creation when encountering an internal error. (#1691) (@maleadt)
- Fix #1692 (#1693) (@amontoison)
- Update manifest (#1694) (@github-actions[bot])
- [CUSPARSE] Support kron with Diagonal arguments (#1695) (@albertomercurio)
- Re-introduce memory limits. (#1698) (@maleadt)
- Adapt to GPUCompiler changes. (#1699) (@maleadt)
- WMMA: Don't wrap fragments of size 1 in a struct. (#1704) (@maleadt)
- Update manifest (#1708) (@github-actions[bot])
- Use plain llvmcall calling convention for WMMA intrinsics. (#1709) (@maleadt)
- Reclaim in cuDNN conv algorithm search (#1711) (@ToucheSir)
- CUBLAS: test against generic axp(b)y, not the BLAS-specific one. (#1713) (@maleadt)
- Fix LU getproperty invoke. (#1714) (@maleadt)
- Backports for 3.12.1 (#1715) (@maleadt)
- Specialize cholcopy to avoid scalar indexing. (#1716) (@maleadt)
- Fix handling of inline-allocated structures with unions. (#1717) (@maleadt)
v3.12.0
CUDA v3.12.0
Closed issues:
- Implement Base.repeat (#177)
repeat
performs scalar indexing for multi-dimensional arrays (#1051)- The GPU compiler fails on a call to
maximum
(#1548) - versioninfo triggers artifact downloads (#1549)
- Error when broadcasting composed functions (#1550)
- overload
Base.copy!
forAbstractGPUArray{<:Any,1}
(#1555)
Merged pull requests:
- Fix math quirk. (#1546) (@maleadt)
- Wrap
cusolverRf.h
andcusolverSp_LOWLEVEL_PREVIEW.h
(#1547) (@frapac) - Update manifest (#1551) (@github-actions[bot])
- tighten
unsafe_wrap
signature on scalar length (#1552) (@sjkelly) - Update Documenter key. (#1553) (@maleadt)
- Update manifest (#1556) (@github-actions[bot])
- Import factorisation internal types from LinearAlgebra (#1558) (@theabhirath)
- Update manifest (#1560) (@github-actions[bot])
- add reshape for CuDeviceArray (#1561) (@omlins)
v3.11.0
CUDA v3.11.0
Closed issues:
- CUSPARSE: Diagonal + CSC/CSR gives dense array (#1469)
- CUBLAS: Multiplication of
UpperTriangular
/LowerTriangular
not supported (#1486) - CUTENSOR tests consume lots of memory, breaking other tests (#1501)
- CUFFT doesn't work for ComplexF64 C2C in-place (#1519)
- Inconsistency of
==
andisequal
forCuArray
(#1524) - Setting CUDA seed the first time changes Random's RNG non-deterministically (#1526)
- Undefined exported symbols (#1527)
- Could not load library libLLVMExtra-14.dll (#1535)
- Add an
rrule
forcholesky
toCUDA.jl
(#1541)
Merged pull requests:
- specialize +/- op for sparse diag (#1514) (@Roger-luo)
- Make sure instantiating RNGs doesn't affect the global CPU RNG. (#1530) (@maleadt)
- Update manifest (#1531) (@github-actions[bot])
ldiv!
for LU Decomposition (#1532) (@SBuercklin)- Lower dmax for contraction tests (#1534) (@kshyatt)
- Fix convolution algorithm search (#1536) (@maxfreu)
- Update manifest (#1537) (@github-actions[bot])
- add specializations for some triangular-triangular multiplications (#1538) (@Red-Portal)
- Add a utility to download artifacts without a functional driver. (#1539) (@maleadt)
- Update manifest (#1543) (@github-actions[bot])
- Explicit tests for type conversion (#1544) (@kshyatt)
- Remove unused exports. (#1545) (@maleadt)