2023.1.0
Improvements
SYCL™ Compiler
Allow FTZ, prec-sqrt to override no-ftz, no-prec-sqrt [8096a6fb]
Implement support for NVIDIA architectures (such as nvidia_gpu_sm_80) as argument to fsycl-targets [e5de913f]
SYCL Library
Implement matrix extension using new “unified” interface [166bbc36]
Support zero range kernel for cuda backends [a3958865]
Add missing macro to interop-backend-traits.cpp [a578c8141]
Allow varying program metadata in CUDA backend [25d05f3d]
Updated tf32 device code check comment [21176576]
Bug Fixes
ext_oneapi_cuda make_device no longer duplicates sycl::device [75302c53a]
Fix incorrectly constructed guards [ce7c594f]
Document
Demonstrate how to pass ptxas options [f48f96eb3f]
Add mention of cuda gpu arch for enabling cuda-arch specific features [4e5d276f]
2023.0.0
Initial release of oneAPI for NVIDIA® GPUs!
This release was created from the intel/llvm repository at commit 0f579ba.
New Features
Support for CUDA® backend
SYCL™ Compiler
Support for sycl::half type
Support for
bf16
builtins operating on storage typesSupport for the SYCL builtins from relational, geometric, common and math categories
Support for sub_group extension
Support for group algorithms
Support for
group_ballot
intrinsicSupport for atomics with scopes and memory orders
Support for multiple streams in each queue to improve concurrent execution
Support for
sycl::queue::mem_advise
Support for
--ffast-math
in CUDA libclcSupport for device side
assert
Support for float and double exchange and compare exchange atomic operations in CUDA libclc
Enabled CXX standard library functions
Native event for default-ctored sycl::event has to be in COMPLETE state
SYCL Library
Added
bf16
builtins forfma
,fmin
andfmax
Support for
sycl::aspect::fp16
Added
tanh
(for floats/halfs) andexp2
(for halfs) native definitionsSupport for
sycl::get_native(sycl::buffer)
Implemented
mem_advise
reset and managed concurrent memory checksSupport for element-wise operations on
joint_matrix
includingbfloat16
supportSupport for Unified Shared Memory (USM)