info

Please note that you are viewing a guide targeting an older version of oneAPI for NVIDIA® GPUs. This guide was designed for version 2025.0.0.

Install oneAPI for NVIDIA GPUs

This guide contains information on using DPC++ to run SYCL™ applications on NVIDIA® GPUs via the DPC++ CUDA® plugin.

For general information about DPC++, refer to the DPC++ Resources section.

Supported Platforms

link

This release should work across a wide array of NVIDIA GPUs and CUDA versions, but Codeplay cannot guarantee correct operation on untested platforms.

The following platforms should work:

Operating System	Architectures	CUDA versions
Linux (glibc 2.31+)	sm_60+, sm_5x (deprecated)	11.7+
Windows	sm_60+, sm_5x (deprecated)	12.0+

The following platforms were used for testing:

Operating System	Tested GPU	CUDA versions
Linux (glibc 2.31+)	NVIDIA A100 40GB (sm_80)	11.7, 12.2
Windows 10	NVIDIA GeForce RTX 4060 Ti (sm_89)	12.5

Prerequisites

link

Install an Intel® oneAPI Base Toolkit version 2025.0.0 that contains the DPC++/C++ Compiler, and the required dependencies.
- For example, the “Intel oneAPI Base Toolkit” should suit most use cases.
- The Toolkit must be version 2025.0.0 - otherwise oneAPI for NVIDIA GPUs cannot be installed.
Install the GPU driver and CUDA software stack for the NVIDIA GPU by following the steps described in the NVIDIA CUDA Installation Guide for Linux.

Note that you may also need to install C++ development tools:

On Linux: cmake, gcc, make.

You will need the following C++ development tools installed in order to build and run oneAPI applications: cmake, gcc, g++, make and pkg-config.

The following console commands will install the above tools on the most popular Linux distributions:

Ubuntu

sudo apt update
sudo apt -y install cmake pkg-config build-essential

Red Hat and Fedora

sudo yum update
sudo yum -y install cmake pkgconfig
sudo yum groupinstall "Development Tools"

SUSE

sudo zypper update
sudo zypper --non-interactive install cmake pkg-config
sudo zypper --non-interactive install pattern devel_C_C++

Verify that the tools are installed by running:

which cmake pkg-config make gcc g++

You should see output similar to:

/usr/bin/cmake
/usr/bin/pkg-config
/usr/bin/make
/usr/bin/gcc
/usr/bin/g++

On Windows: MSVC build tools 2022, only the build tools are necessary but a full MSVC installation also works.

Installation

link

Download the latest oneAPI for NVIDIA GPUs installer for your platform from our website.
On Linux, run the downloaded self-extracting installer:

sh oneapi-for-nvidia-gpus-2025.0.0-cuda-12.0-linux.sh
- The installer will search for an existing Intel oneAPI Toolkit version 2025.0.0 installation in common locations. If you have installed an Intel oneAPI Toolkit in a custom location, use --install-dir /path/to/intel/oneapi.
- If your Intel oneAPI Toolkit installation is outside your home directory, you may be required to run this command with elevated privileges, e.g. sudo.
On Windows, run the installer:
- The installer should detect your Intel oneAPI Toolkit installation, and pre-populate the installation directory, if it fails to do so you should point it to the directory where the oneAPI DPC++ compiler is located (for example C:\Program Files (x86)\Intel\oneAPI\compiler\2024.2).

Set Up Your Environment

link

To set up your oneAPI environment in your current session, source the Intel-provided setvars.sh or setvars.bat script, available in the oneAPI installation directory, for example:
```
# On Linux
. /opt/intel/oneapi/setvars.sh

# On Windows:
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
```
- On Windows the command prompt used will need to have access to the MSVC build tools, for example by using the “x64 Native Tools Command Prompt”
- The --include-intel-llvm option may be used in order to add LLVM tools such as clang++ to the path.
- Note that you will have to run this script in every new terminal session. For options to handle the setup automatically each session, see the relevant Intel oneAPI Toolkit documentation, such as Set Environment Variables for CLI Development
Ensure that the CUDA libraries and tools can be found in your environment.
1. Run nvidia-smi - if it runs without any obvious errors in the output then your environment should be set up correctly.
2. Otherwise, set your environment variables manually:
```
# On Linux
export PATH=/PATH_TO_CUDA_ROOT/bin:$PATH
export LD_LIBRARY_PATH=/PATH_TO_CUDA_ROOT/lib:$LD_LIBRARY_PATH

# On Windows
set PATH="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\CUDA_VERSION\bin";%PATH%
```

To verify the DPC++ CUDA plugin installation, the DPC++ sycl-ls tool can be used to make sure that SYCL now exposes the available NVIDIA GPUs. You should see something similar to the following in the sycl-ls output if NVIDIA GPUs are found:

[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.0 [CUDA 12.5]

If the available NVIDIA GPUs are correctly listed, then the DPC++ CUDA plugin was correctly installed and set up.
Otherwise, see the “Missing devices in sycl-ls output” section of the Troubleshooting documentation.
Note that this command may also list other devices such as OpenCL™ devices, Intel GPUs, or AMD GPUs, based on the available hardware and DPC++ plugins installed.

Run a Sample Application

link

Create a file simple-sycl-app.cpp with the following C++/SYCL code:

#include <sycl/sycl.hpp>

int main() {
  // Creating buffer of 4 ints to be used inside the kernel code
  sycl::buffer<int, 1> Buffer{4};

  // Creating SYCL queue
  sycl::queue Queue{};

  // Size of index space for kernel
  sycl::range<1> NumOfWorkItems{Buffer.size()};

  // Submitting command group(work) to queue
  Queue.submit([&](sycl::handler &cgh) {
    // Getting write only access to the buffer on a device
    auto Accessor = Buffer.get_access<sycl::access::mode::write>(cgh);
    // Executing kernel
    cgh.parallel_for<class FillBuffer>(
        NumOfWorkItems, [=](sycl::id<1> WIid) {
          // Fill buffer with indexes
          Accessor[WIid] = static_cast<int>(WIid.get(0));
        });
  });

  // Getting read only access to the buffer on the host.
  // Implicit barrier waiting for queue to complete the work.
  auto HostAccessor = Buffer.get_host_access();

  // Check the results
  bool MismatchFound{false};
  for (size_t I{0}; I < Buffer.size(); ++I) {
    if (HostAccessor[I] != I) {
      std::cout << "The result is incorrect for element: " << I
                << " , expected: " << I << " , got: " << HostAccessor[I]
                << std::endl;
      MismatchFound = true;
    }
  }

  if (!MismatchFound) {
    std::cout << "The results are correct!" << std::endl;
  }

  return MismatchFound;
}

Compile the application with icpx on Linux, and icx-cl on Windows:

# On Linux
icpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda simple-sycl-app.cpp -o simple-sycl-app

# On Windows
icx-cl -fsycl -fsycl-targets=nvptx64-nvidia-cuda simple-sycl-app.cpp -o simple-sycl-app

Depending on your CUDA version, you may see this warning, which can be safely ignored:

icpx: warning: CUDA version is newer than the latest supported version 12.1 [-Wunknown-cuda-version]

Run the application with:

# On Linux
ONEAPI_DEVICE_SELECTOR="cuda:*" SYCL_UR_TRACE=1 ./simple-sycl-app

# On Windows
set ONEAPI_DEVICE_SELECTOR="cuda:*"
set SYCL_UR_TRACE=1
simple-sycl-app.exe

You should see output like:

<LOADER>[INFO]: loaded adapter 0x0xc7a670 (libur_adapter_cuda.so.0)
SYCL_UR_TRACE: Requested device_type: info::device_type::automatic
SYCL_UR_TRACE: Selected device: -> final score = 1500
SYCL_UR_TRACE:   platform: NVIDIA CUDA BACKEND
SYCL_UR_TRACE:   device: NVIDIA GeForce RTX 4060 Ti
The results are correct!

If so, you have successfully set up and verified your oneAPI for NVIDIA GPUs development environment, and you can begin developing oneAPI applications.

The rest of this document provides general information on compiling and running oneAPI applications on NVIDIA GPUs.

Use DPC++ to Target NVIDIA GPUs

link

Compile for NVIDIA GPUs

link

Note

On Windows icpx should be replaced with icx-cl in all the examples in this guide.

To compile a SYCL application for NVIDIA GPUs, use the icpx compiler provided with DPC++ (or icx-cl on Windows). For example:

# On Linux
icpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda sycl-app.cpp -o sycl-app

# On Windows
icx-cl -fsycl -fsycl-targets=nvptx64-nvidia-cuda sycl-app.cpp -o sycl-app

The following flags are required:

-fsycl: Instructs the compiler to build the C++ source file in SYCL mode. This flag will also implicitly enable C++17 and automatically link against the SYCL runtime library.
-fsycl-targets=nvptx64-nvidia-cuda: Instructs the compiler to build SYCL kernels for the NVIDIA GPU target.

It is also possible to build the SYCL kernels for a specific NVIDIA architecture using the following flags:

-Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_80

Note that kernels are built for sm_50 by default, allowing them to work on a wider range of architectures, but limiting the usage of more recent CUDA features.

For more information on available SYCL compilation flags, see the DPC++ Compiler User’s Manual or for information on all DPC++ compiler options see the Compiler Options section of the Intel oneAPI DPC++/C++ Compiler Developer Guide and Reference.

Using the `icpx` or `icx-cl` compiler

link

The icpx compiler is by default a lot more aggressive with optimizations than the regular clang++ driver, as it uses both -O2 and -ffast-math. In many cases this can lead to better performance but it can also lead to some issues for certain applications. In such cases it is possible to disable -ffast-math by using -fno-fast-math and to change the optimization level by passing a different -O flag. It is also possible to directly use the clang++ driver which can be found in $releasedir/compiler/latest/linux/bin-llvm/clang++, to get regular clang++ behavior.

Compile for Multiple Targets

link

In addition to targeting NVIDIA GPUs, you can build SYCL applications that can be compiled once and then run on a range of hardware. The following example shows how to output a single binary including device code that can run on NVIDIA GPUs, AMD GPUs, or any device that supports SPIR e.g. Intel GPUs.

icpx -fsycl -fsycl-targets=spir64,amdgcn-amd-amdhsa,nvptx64-nvidia-cuda \
        -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx1030 \
        -Xsycl-target-backend=nvptx64-nvidia-cuda --offload-arch=sm_80 \
        -o sycl-app sycl-app.cpp

The above command can be broken down into the following ingredients:

-fsycl tells the compiler to compile SYCL device code in addition to host code.
-fsycl-targets=spir64,amdgcn-amd-amdhsa,nvptx64-nvidia-cuda specifies three targets for SYCL device code. The targets may be generic and produce code which will be further compiled just-in-time (JIT) during execution for a specific GPU architecture (Intel and NVIDIA targets only). Further flags may be added to generate architecture-specific device code ahead-of-time (AOT) in addition to the generic code, as follows.
-Xsycl-target-backend=amdgcn-amd-amdhsa tells the flag parser that the following flag should be passed only to the compiler backend for the amdgcn-amd-amdhsa target and no other targets. Specifying -Xsycl-target-backend without any value would pass the following flag to the compiler backend for all SYCL device targets.
--offload-arch=gfx1030 specifies the AMD GPU architecture gfx1030 for the AOT compilation.
-Xsycl-target-backend=nvptx64-nvidia-cuda tells the flag parser that the following flag should be passed only to the compiler backend for the nvptx64-nvidia-cuda target.
--offload-arch=sm_80 specifies the NVIDIA GPU architecture (compute capability) sm_80 for the AOT compilation.

A binary compiled in the above way can successfully run the SYCL kernels on:

Intel CPUs and GPUs with JIT compilation of the SPIR code
AMD GPUs with architecture gfx1030 directly from the binary without any JIT compilation
NVIDIA GPUs with compute capability 8.0 directly from the binary without any JIT compilation
other supported NVIDIA GPUs with JIT compilation

AOT compilation for Intel hardware is also possible in combination with AMD and NVIDIA targets, and can be achieved by using the spir64_gen target and the corresponding architecture flags. For example, to compile the above application AOT for the Ponte Vecchio Intel graphics architecture, the following command can be used:

icpx -fsycl -fsycl-targets=spir64_gen,amdgcn-amd-amdhsa,nvptx64-nvidia-cuda \
        -Xsycl-target-backend=spir64_gen '-device pvc' \
        -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx1030 \
        -Xsycl-target-backend=nvptx64-nvidia-cuda --offload-arch=sm_80 \
        -o sycl-app sycl-app.cpp

Note the different syntax -device <arch> as compared to --offload-arch=<arch> which is required due to a different compiler toolchain being used for the Intel targets.

The compiler driver also offers alias targets for each target+architecture pair to make the command line shorter and easier to understand for humans. Thanks to the aliases, the -Xsycl-target-backend flags no longer need to be specified. The above command is equivalent to:

icpx -fsycl -fsycl-targets=intel_gpu_pvc,amd_gpu_gfx1030,nvidia_gpu_sm_80 \
        -o sycl-app sycl-app.cpp

The full list of aliases is documented in the DPC++ Compiler User’s Manual.

Run SYCL Applications on NVIDIA GPUs

link

After compiling your SYCL application for an NVIDIA target, you should also ensure that the correct SYCL device representing the NVIDIA GPU is selected at runtime.

In general, simply using the default device selector should select one of the available NVIDIA GPUs. However in some scenarios, users may want to change their SYCL application to use a more precise SYCL device selector, such as the GPU selector, or even a custom selector.

Controlling Nvidia devices exposed to DPC++

link

In certain circumstances it will be desirable for the user to enforce that only certain GPUs are available to a SYCL programming implementation such as DPC++. This is possible through some environment variables that will now be described. These environment variables also allow users to control the sharing of GPU resources within a shared GPU cluster.

Device selector env variables

link

The environment variable ONEAPI_DEVICE_SELECTOR may be used to restrict the set of devices that can be used. For example, to only allow devices exposed by the DPC++ CUDA plugin, set the following option:

# On Linux
export ONEAPI_DEVICE_SELECTOR="cuda:*"

# On Windows
set ONEAPI_DEVICE_SELECTOR="cuda:*"

To only allow a subset of devices from the cuda backend use a comma separated list, e.g.:

# On Linux
export ONEAPI_DEVICE_SELECTOR="cuda:1,3"

# On Windows
set ONEAPI_DEVICE_SELECTOR="cuda:1,3"

Then the following will populate Devs with the two Nvidia devices only:

    std::vector<sycl::device> Devs;
    for (const auto &plt : sycl::platform::get_platforms()) {
      if (plt.get_backend() == sycl::backend::ext_oneapi_cuda) {}
        Devs=plt.get_devices();
        break;
      }
    }

Such that Devs[0] and Devs[1] will correspond to the devices marked 1 and 3 respectively, when a user invokes nvidia-smi.

For more details covering ONEAPI_DEVICE_SELECTOR, see the Environment Variables section of the oneAPI DPC++ Compiler documentation.

In the case that only Nvidia devices are exposed to DPC++ the above described usage of ONEAPI_DEVICE_SELECTOR is equivalent to setting the CUDA_VISIBLE_DEVICES environment variable:

# On Linux
export CUDA_VISIBLE_DEVICES=1,3

# On Windows
set CUDA_VISIBLE_DEVICES=1,3

In this circumstance that only Nvidia GPUs are available, an identical list can be populated in a simpler way:

    std::vector<sycl::device> Devs =
      sycl::device::get_devices(sycl::info::device_type::gpu);

Then Devs[0] and Devs[1] will correspond to the devices marked 1 and 3 by nvidia-smi.

DPC++ Resources

link

SYCL Resources

link

Rate this Guide

oneAPI for NVIDIA® GPUs 2025.0.0

Debugging SYCL Applications

assignmentJump to Section

Install oneAPI for NVIDIA GPUs
Supported Platforms
Prerequisites
Installation
Set Up Your Environment
Verify Your installation
Run a Sample Application
Use DPC++ to Target NVIDIA GPUs
Compile for NVIDIA GPUs
Using the icpx or icx-cl compiler
Compile for Multiple Targets
Run SYCL Applications on NVIDIA GPUs
DPC++ Resources
SYCL Resources

oneAPI Menu

Main Menu

Products

menu_bookGuides

Install oneAPI for NVIDIA GPUs

Supported Platforms

Prerequisites

Installation

Set Up Your Environment

Verify Your installation

Run a Sample Application

Use DPC++ to Target NVIDIA GPUs

Compile for NVIDIA GPUs

Using the `icpx` or `icx-cl` compiler

Compile for Multiple Targets

Run SYCL Applications on NVIDIA GPUs

Controlling Nvidia devices exposed to DPC++

Device selector env variables

DPC++ Resources

SYCL Resources

oneAPI for NVIDIA® GPUs 2025.0.0

Debugging SYCL Applications

assignmentJump to Section

Select a Product

oneAPI

Dark Mode

Light Mode

Also,

part of our network

Codeplay.com

SYCL.tech

Codeplay Developer

Codeplay Open Source

menu_bookGuides

Supported Platforms

Prerequisites

Installation

Set Up Your Environment

Verify Your installation

Run a Sample Application

Use DPC++ to Target NVIDIA GPUs

Compile for NVIDIA GPUs

Using the icpx or icx-cl compiler

Compile for Multiple Targets

Run SYCL Applications on NVIDIA GPUs

Controlling Nvidia devices exposed to DPC++

Device selector env variables

DPC++ Resources

SYCL Resources

oneAPI for NVIDIA® GPUs 2025.0.0

Debugging SYCL Applications

assignmentJump to Section

Select a Product

oneAPI

Dark Mode

Light Mode

Codeplay.com

SYCL.tech

Codeplay Developer

Codeplay Open Source

Using the `icpx` or `icx-cl` compiler