2023.1.0
Improvements
SYCL™ Compiler
Allow FTZ, prec-sqrt to override no-ftz, no-prec-sqrt [8096a6fb]
Implement support for NVIDIA architectures (such as nvidia_gpu_sm_80) as argument to fsycl-targets [e5de913f]
SYCL Library
Implement matrix extension using new “unified” interface [166bbc36]
Support zero range kernel for cuda backends [a3958865]
Add missing macro to interop-backend-traits.cpp [a578c8141]
Allow varying program metadata in CUDA backend [25d05f3d]
Updated tf32 device code check comment [21176576]
Bug Fixes
ext_oneapi_cuda make_device no longer duplicates sycl::device [75302c53a]
Fix incorrectly constructed guards [ce7c594f]
Document
Demonstrate how to pass ptxas options [f48f96eb3f]
Add mention of cuda gpu arch for enabling cuda-arch specific features [4e5d276f]
2023.0.0
Initial release of oneAPI for NVIDIA® GPUs!
This release was created from the intel/llvm repository at commit 0f579ba.
New Features
Support for CUDA® backend
SYCL™ Compiler
Support for sycl::half type
Support for
bf16builtins operating on storage typesSupport for the SYCL builtins from relational, geometric, common and math categories
Support for sub_group extension
Support for group algorithms
Support for
group_ballotintrinsicSupport for atomics with scopes and memory orders
Support for multiple streams in each queue to improve concurrent execution
Support for
sycl::queue::mem_adviseSupport for
--ffast-mathin CUDA libclcSupport for device side
assertSupport for float and double exchange and compare exchange atomic operations in CUDA libclc
Enabled CXX standard library functions
Native event for default-ctored sycl::event has to be in COMPLETE state
SYCL Library
Added
bf16builtins forfma,fminandfmaxSupport for
sycl::aspect::fp16Added
tanh(for floats/halfs) andexp2(for halfs) native definitionsSupport for
sycl::get_native(sycl::buffer)Implemented
mem_advisereset and managed concurrent memory checksSupport for element-wise operations on
joint_matrixincludingbfloat16supportSupport for Unified Shared Memory (USM)