info

Please note that you are viewing a guide targeting an older version of oneAPI for NVIDIA® GPUs. This guide was designed for version 2024.0.0.

Improved the work_group size selection for the parallel_for interface taking a sycl::range.

SYCL Library

link

Added support for the sycl_ext_oneapi_graph extension.
Added support for the sycl_ext_oneapi_non_uniform_groups extension.
Added support for the sycl_ext_oneapi_peer_access extension.
Introduced the max_registers_per_work_group device query.
Added a mechanism via SYCL_PROGRAM_COMPILE_OPTIONS such that the maxrregcount ptxas compiler option can be passed to the cuda backend in the following way: SYCL_PROGRAM_COMPILE_OPTIONS="--maxrregcount=<value>". Note that this works for JIT compiled programs only.

Bug Fixes

link

Double precision floating point Group algorithms: broadcast, joint_exclusive_scan, joint_inclusive_scan, exclusive_scan_over_group, inclusive_scan_over_group have been fixed and no longer hang when compiled with the icpx compiler.
Fixed a bug in atomic_fence where it was using the wrong memory_scope when passed sycl::memory_scope::device as an argument.

2023.2.0

link

Improvements

link

SYCL Compiler

link

clang++/cuda performance improvement by increasing inline threshold multiplier in NVPTX backend [22d98280]
Define __SYCL_CUDA_ARCH__ instead of __CUDA_ARCH__ for SYCL [8f5000c3]

SYCL Library

link

Introduced sycl_ext_oneapi_cuda_tex_cache_read to expose the __ldg* clang builtins to sycl as a cuda only extension - Read-only Texture Cache [5360825e]
Report cl_khr_subgroups as a subgroups supporting extension [8e6c092b]
atomic_fence device queries now return the minimum required capabilities rather than failing with an error [82ac98f8] may be dropped by NVIDIA [1e88df54]
Support the query of theoretical peak memory bandwidth - Intel’s Extensions for Device Information [8ce0a6d5]
Add Support for device ID and UUID - Intel’s Extensions for Device Information [8213074d]
Support host-device memcpy2D [d0b25d4a]
Add support for sycl_ext_oneapi_memcpy2d on CUDA backend - OneAPI memcpy2d [9008a5d2]

Bug Fixes

link

Replace error on invalid work group size to PI_ERROR_INVALID_WORK_GROUP_SIZE [2357af0a]
Address Wrong results from sycl::ctz function [5a9f601e]
Address the issue that can cause events not to be waited on as intended [1b225447]

2023.1.0

link

Improvements

link

SYCL Compiler

link

Allow FTZ, prec-sqrt to override no-ftz, no-prec-sqrt [8096a6fb]
Implement support for NVIDIA architectures (such as nvidia_gpu_sm_80) as argument to fsycl-targets [e5de913f]

SYCL Library

link

Implement matrix extension using new “unified” interface [166bbc36]
Support zero range kernel for cuda backends [a3958865]
Add missing macro to interop-backend-traits.cpp [a578c8141]
Allow varying program metadata in CUDA backend [25d05f3d]

Bug Fixes

link

ext_oneapi_cuda make_device no longer duplicates sycl::device [75302c53a]
Fix incorrectly constructed guards [ce7c594f]

Document

link

Demonstrate how to pass ptxas options [f48f96eb3f]
Add mention of cuda gpu arch for enabling cuda-arch specific features [4e5d276f]

2023.0.0

link

Initial release of oneAPI for NVIDIA® GPUs!

This release was created from the intel/llvm repository at commit 0f579ba.

New Features

link

Support for CUDA® backend

SYCL Compiler

link

Support for sycl::half type
Support for bf16 builtins operating on storage types
Support for the SYCL builtins from relational, geometric, common and math categories
Support for sub_group extension
Support for group algorithms
Support for group_ballot intrinsic
Support for atomics with scopes and memory orders
Support for multiple streams in each queue to improve concurrent execution
Support for sycl::queue::mem_advise
Support for --ffast-math in CUDA libclc
Support for device side assert
Support for float and double exchange and compare exchange atomic operations in CUDA libclc
Enabled CXX standard library functions
Native event for default-ctored sycl::event has to be in COMPLETE state

SYCL Library

link

Add bfloat16 builtins for fma, fmin and fmax
Support for sycl::aspect::fp16
Add tanh (for floats/halfs) and exp2 (for halfs) native definitions
Support for sycl::get_native(sycl::buffer)
Implemented mem_advise reset and managed concurrent memory checks
Support for element-wise operations on joint_matrix including bfloat16 support, oneAPI matrix extension
Support for Unified Shared Memory (USM)

Rate this Guide

oneAPI Menu

Main Menu

Products

menu_bookGuides

Changelog

2024.0

Improvements

SYCL Compiler

SYCL Library

Bug Fixes

2023.2.0

Improvements

SYCL Compiler

SYCL Library

Bug Fixes

2023.1.0

Improvements

SYCL Compiler

SYCL Library

Bug Fixes

Document

2023.0.0

New Features

SYCL Compiler

SYCL Library

Features

Troubleshooting

assignmentJump to Section

Select a Product

oneAPI

Dark Mode

Light Mode

Also,

part of our network

Codeplay.com

SYCL.tech

Codeplay Developer

Codeplay Open Source