2023.2.0
Improvements
SYCL Compiler
clang++/cuda performance improvement by increasing inline threshold multiplier in NVPTX backend [22d98280]
Define
__SYCL_CUDA_ARCH__
instead of__CUDA_ARCH__
for SYCL [8f5000c3]
SYCL Library
Introduced
sycl_ext_oneapi_cuda_tex_cache_read
to expose the__ldg*
clang builtins to sycl as a cuda only extension - Read-only Texture Cache [5360825e]Report
cl_khr_subgroups
as a subgroups supporting extension [8e6c092b]atomic_fence
device queries now return the minimum required capabilities rather than failing with an error [82ac98f8] may be dropped by NVIDIA [1e88df54]Support the query of theoretical peak memory bandwidth - Intel’s Extensions for Device Information [8ce0a6d5]
Add Support for device ID and UUID - Intel’s Extensions for Device Information [8213074d]
Support host-device
memcpy2D
[d0b25d4a]Add support for
sycl_ext_oneapi_memcpy2d
on CUDA backend - OneAPI memcpy2d [9008a5d2]
Bug Fixes
Replace error on invalid work group size to
PI_ERROR_INVALID_WORK_GROUP_SIZE
[2357af0a]Address Wrong results from
sycl::ctz
function [5a9f601e]Address the issue that can cause events not to be waited on as intended [1b225447]
2023.1.0
Improvements
SYCL Compiler
Allow FTZ, prec-sqrt to override no-ftz, no-prec-sqrt [8096a6fb]
Implement support for NVIDIA architectures (such as nvidia_gpu_sm_80) as argument to fsycl-targets [e5de913f]
SYCL Library
Implement matrix extension using new “unified” interface [166bbc36]
Support zero range kernel for cuda backends [a3958865]
Add missing macro to interop-backend-traits.cpp [a578c8141]
Allow varying program metadata in CUDA backend [25d05f3d]
Bug Fixes
ext_oneapi_cuda make_device no longer duplicates sycl::device [75302c53a]
Fix incorrectly constructed guards [ce7c594f]
Document
Demonstrate how to pass ptxas options [f48f96eb3f]
Add mention of cuda gpu arch for enabling cuda-arch specific features [4e5d276f]
2023.0.0
Initial release of oneAPI for NVIDIA® GPUs!
This release was created from the intel/llvm repository at commit 0f579ba.
New Features
Support for CUDA® backend
SYCL Compiler
Support for sycl::half type
Support for
bf16
builtins operating on storage typesSupport for the SYCL builtins from relational, geometric, common and math categories
Support for sub_group extension
Support for group algorithms
Support for
group_ballot
intrinsicSupport for atomics with scopes and memory orders
Support for multiple streams in each queue to improve concurrent execution
Support for
sycl::queue::mem_advise
Support for
--ffast-math
in CUDA libclcSupport for device side
assert
Support for float and double exchange and compare exchange atomic operations in CUDA libclc
Enabled CXX standard library functions
Native event for default-ctored sycl::event has to be in COMPLETE state
SYCL Library
Add
bfloat16
builtins forfma
,fmin
andfmax
Support for
sycl::aspect::fp16
Add
tanh
(for floats/halfs) andexp2
(for halfs) native definitionsSupport for
sycl::get_native(sycl::buffer)
Implemented
mem_advise
reset and managed concurrent memory checksSupport for element-wise operations on
joint_matrix
includingbfloat16
support, oneAPI matrix extensionSupport for Unified Shared Memory (USM)