This guide contains information on using DPC++ to run SYCL™ applications on AMD GPUs via the DPC++ HIP plugin version 2025.0.0.
For general information about DPC++, refer to the DPC++ Resources section.
Supported Platforms
This release should work across a wide array of AMD GPUs, but Codeplay cannot guarantee correct operation on untested platforms.
The following platforms should work:
Operating System |
Architectures |
ROCm versions |
---|---|---|
Linux (glibc 2.31+) |
5.4.3, 6.0, 6.1, 6.2 |
This release has been tested on the following platforms (using the upstream AMD GPU driver in the Linux kernel):
Operating System |
Tested GPU |
ROCm versions |
---|---|---|
Linux (glibc 2.31+) |
AMD Instinct MI210 (gfx90a) |
5.4.3, 6.0, 6.1, 6.2 |
In some cases the package for a given HIP version may work with future minor versions, however Codeplay cannot guarantee that anything other than the tested HIP releases will work.
Each of the tested versions can be installed alongside another existing HIP installation. See the Multi-version ROCm Installation section of the ROCm Installation Guide.
Prerequisites
Install C++ development tools.
You will need the following C++ development tools installed in order to build and run oneAPI applications:
cmake
,gcc
,g++
,make
andpkg-config
.The following console commands will install the above tools on the most popular Linux distributions:
Ubuntu
sudo apt update sudo apt -y install cmake pkg-config build-essential
Red Hat and Fedora
sudo yum update sudo yum -y install cmake pkgconfig sudo yum groupinstall "Development Tools"
SUSE
sudo zypper update sudo zypper --non-interactive install cmake pkg-config sudo zypper --non-interactive install pattern devel_C_C++
Verify that the tools are installed by running:
which cmake pkg-config make gcc g++
You should see output similar to:
/usr/bin/cmake /usr/bin/pkg-config /usr/bin/make /usr/bin/gcc /usr/bin/g++
Install an Intel® oneAPI Toolkit version 2025.0.0, that contains the DPC++/C++ Compiler.
For example, the “Intel oneAPI Base Toolkit” should suit most use cases.
The Toolkit must be version 2025.0.0 - otherwise oneAPI for AMD GPUs cannot be installed.
Install the GPU driver and ROCm™ software stack for the AMD GPU.
For example, for ROCm 5.4.3, follow the steps described in the Installation with install script guide.
Using the
amdgpu-install
installer is recommended, with the--usecase="dkms,graphics,opencl,hip,hiplibsdk"
argument to ensure that all required components are installed.
Installation
Download the latest oneAPI for AMD GPUs installer from our website.
Run the downloaded self-extracting installer:
sh oneapi-for-amd-gpus-2025.0.0-rocm-5.4.3-linux.sh
The installer will search for an existing Intel oneAPI Toolkit version 2025.0.0 installation in common locations. If you have installed an Intel oneAPI Toolkit in a custom location, use
--install-dir /path/to/intel/oneapi
.If your Intel oneAPI Toolkit installation is outside your home directory, you may be required to run this command with elevated privileges, e.g.
sudo
.
Set Up Your Environment
To set up your oneAPI environment in your current session, source the Intel-provided
setvars.sh
script.For system-wide installations:
. /opt/intel/oneapi/setvars.sh --include-intel-llvm
For private installations (in the default location):
. ~/intel/oneapi/setvars.sh --include-intel-llvm
The
--include-intel-llvm
option is required in order to add LLVM tools such asclang++
to the path.Note that you will have to run this script in every new terminal session. For options to handle the setup automatically each session, see the relevant Intel oneAPI Toolkit documentation, such as Set Environment Variables for CLI Development.
Ensure that the HIP libraries and tools can be found in your environment:
Run
rocminfo
- if it runs without any obvious errors in the output then your environment should be set up correctly.Otherwise, set your environment variables manually:
export PATH=/PATH_TO_ROCM_ROOT/bin:$PATH export LD_LIBRARY_PATH=/PATH_TO_ROCM_ROOT/lib:$LD_LIBRARY_PATH
ROCm is commonly installed in
/opt/rocm-x.x.x/
.
Verify Your Installation
To verify the DPC++ HIP plugin installation, the DPC++ sycl-ls
tool can be
used to make sure that SYCL now exposes the available AMD GPUs. You should see
something similar to the following in the sycl-ls
output if AMD GPUs are found:
[hip:gpu][hip:0] AMD HIP BACKEND, AMD Radeon PRO W6800 gfx1030 [HIP 60140.9]
If the available AMD GPUs are correctly listed, then the DPC++ HIP plugin was correctly installed and set up.
Otherwise, see the “Missing devices in sycl-ls output” section of the Troubleshooting documentation.
Note that this command may also list other devices such as OpenCL™ devices, Intel GPUs, or NVIDIA® GPUs, based on the available hardware and DPC++ plugins installed.
Run a Sample Application
Create a file
simple-sycl-app.cpp
with the following C++/SYCL code:#include <sycl/sycl.hpp> int main() { // Creating buffer of 4 ints to be used inside the kernel code sycl::buffer<int, 1> Buffer{4}; // Creating SYCL queue sycl::queue Queue{}; // Size of index space for kernel sycl::range<1> NumOfWorkItems{Buffer.size()}; // Submitting command group(work) to queue Queue.submit([&](sycl::handler &cgh) { // Getting write only access to the buffer on a device auto Accessor = Buffer.get_access<sycl::access::mode::write>(cgh); // Executing kernel cgh.parallel_for<class FillBuffer>( NumOfWorkItems, [=](sycl::id<1> WIid) { // Fill buffer with indexes Accessor[WIid] = static_cast<int>(WIid.get(0)); }); }); // Getting read only access to the buffer on the host. // Implicit barrier waiting for queue to complete the work. auto HostAccessor = Buffer.get_host_access(); // Check the results bool MismatchFound{false}; for (size_t I{0}; I < Buffer.size(); ++I) { if (HostAccessor[I] != I) { std::cout << "The result is incorrect for element: " << I << " , expected: " << I << " , got: " << HostAccessor[I] << std::endl; MismatchFound = true; } } if (!MismatchFound) { std::cout << "The results are correct!" << std::endl; } return MismatchFound; }
Compile the application with:
icpx -fsycl -fsycl-targets=amdgcn-amd-amdhsa \ -Xsycl-target-backend --offload-arch=<ARCH> \ -o simple-sycl-app simple-sycl-app.cpp
Where ARCH is the GPU architecture e.g.
gfx1030
, which you can check by running:rocminfo | grep 'Name: *gfx.*'
You should see the GPU architecture in the output, for example:
Name: gfx1030
Run the application with:
ONEAPI_DEVICE_SELECTOR="hip:*" SYCL_UR_TRACE=1 ./simple-sycl-app
You should see output like:
<LOADER>[INFO]: loaded adapter 0x0x43c050 (libur_adapter_hip.so.0) SYCL_UR_TRACE: Requested device_type: info::device_type::automatic SYCL_UR_TRACE: Selected device: -> final score = 1500 SYCL_UR_TRACE: platform: AMD HIP BACKEND SYCL_UR_TRACE: device: AMD Instinct MI210 The results are correct!
If so, you have successfully set up and verified your oneAPI for AMD GPUs development environment, and you can begin developing oneAPI applications.
The rest of this document provides general information on compiling and running oneAPI applications on AMD GPUs.
Use DPC++ to Target AMD GPUs
Compile for AMD GPUs
To compile a SYCL application for AMD GPUs, use the icpx
compiler
provided with DPC++. For example:
icpx -fsycl -fsycl-targets=amdgcn-amd-amdhsa \
-Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx1030 \
-o sycl-app sycl-app.cpp
The following flags are required:
-fsycl
: Instructs the compiler to build the C++ source file in SYCL mode. This flag will also implicitly enable C++17 and automatically link against the SYCL runtime library.-fsycl-targets=amdgcn-amd-amdhsa
: Instructs the compiler to build SYCL kernels for the AMD GPU target.-Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx1030
: Instructs the compiler to build SYCL kernels for thegfx1030
AMD GPU architecture.
Note that when targeting an AMD GPU, the specific architecture of the GPU must be provided.
For more information on available SYCL compilation flags, see the DPC++ Compiler User’s Manual or for information on all DPC++ compiler options see the Compiler Options section of the Intel oneAPI DPC++/C++ Compiler Developer Guide and Reference.
Using the icpx
compiler
The icpx
compiler is by default a lot more aggressive with optimizations than
the regular clang++
driver, as it uses both -O2
and -ffast-math
. In many
cases this can lead to better performance but it can also lead to some issues
for certain applications. In such cases it is possible to disable -ffast-math
by using -fno-fast-math
and to change the optimization level by passing a
different -O
flag. It is also possible to directly use the clang++
driver
which can be found in $releasedir/compiler/latest/linux/bin-llvm/clang++
, to
get regular clang++
behavior.
Compile for Multiple Targets
In addition to targeting AMD GPUs, you can build SYCL applications that can be compiled once and then run on a range of hardware. The following example shows how to output a single binary including device code that can run on AMD GPUs, NVIDIA GPUs, or any device that supports SPIR e.g. Intel GPUs.
icpx -fsycl -fsycl-targets=spir64,amdgcn-amd-amdhsa,nvptx64-nvidia-cuda \
-Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx1030 \
-Xsycl-target-backend=nvptx64-nvidia-cuda --offload-arch=sm_80 \
-o sycl-app sycl-app.cpp
The above command can be broken down into the following ingredients:
-fsycl
tells the compiler to compile SYCL device code in addition to host code.-fsycl-targets=spir64,amdgcn-amd-amdhsa,nvptx64-nvidia-cuda
specifies three targets for SYCL device code. The targets may be generic and produce code which will be further compiled just-in-time (JIT) during execution for a specific GPU architecture (Intel and NVIDIA targets only). Further flags may be added to generate architecture-specific device code ahead-of-time (AOT) in addition to the generic code, as follows.-Xsycl-target-backend=amdgcn-amd-amdhsa
tells the flag parser that the following flag should be passed only to the compiler backend for theamdgcn-amd-amdhsa
target and no other targets. Specifying-Xsycl-target-backend
without any value would pass the following flag to the compiler backend for all SYCL device targets.--offload-arch=gfx1030
specifies the AMD GPU architecture gfx1030 for the AOT compilation.-Xsycl-target-backend=nvptx64-nvidia-cuda
tells the flag parser that the following flag should be passed only to the compiler backend for thenvptx64-nvidia-cuda
target.--offload-arch=sm_80
specifies the NVIDIA GPU architecture (compute capability) sm_80 for the AOT compilation.
A binary compiled in the above way can successfully run the SYCL kernels on:
Intel CPUs and GPUs with JIT compilation of the SPIR code
AMD GPUs with architecture gfx1030 directly from the binary without any JIT compilation
NVIDIA GPUs with compute capability 8.0 directly from the binary without any JIT compilation
other supported NVIDIA GPUs with JIT compilation
AOT compilation for Intel hardware is also possible in combination with AMD and
NVIDIA targets, and can be achieved by using the spir64_gen
target and the
corresponding architecture flags. For example, to compile the above application
AOT for the Ponte Vecchio Intel graphics architecture, the following command can
be used:
icpx -fsycl -fsycl-targets=spir64_gen,amdgcn-amd-amdhsa,nvptx64-nvidia-cuda \
-Xsycl-target-backend=spir64_gen '-device pvc' \
-Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx1030 \
-Xsycl-target-backend=nvptx64-nvidia-cuda --offload-arch=sm_80 \
-o sycl-app sycl-app.cpp
Note the different syntax -device <arch>
as compared to
--offload-arch=<arch>
which is required due to a different compiler toolchain
being used for the Intel targets.
The compiler driver also offers alias targets for each target+architecture pair
to make the command line shorter and easier to understand for humans. Thanks to
the aliases, the -Xsycl-target-backend
flags no longer need to be specified.
The above command is equivalent to:
icpx -fsycl -fsycl-targets=intel_gpu_pvc,amd_gpu_gfx1030,nvidia_gpu_sm_80 \
-o sycl-app sycl-app.cpp
The full list of aliases is documented in the DPC++ Compiler User’s Manual.
Run SYCL Applications on AMD GPUs
After compiling your SYCL application for an AMD target, you should also ensure that the correct SYCL device representing the AMD GPU is selected at runtime.
In general, simply using the default device selector should select one of the available AMD GPUs. However in some scenarios, users may want to change their SYCL application to use a more precise SYCL device selector, such as the GPU selector, or even a custom selector.
Controlling AMD devices exposed to DPC++
In certain circumstances it will be desirable for the user to enforce that only certain GPUs are available to a SYCL programming implementation such as DPC++. This is possible through some environment variables that will now be described. These environment variables also allow users to control the sharing of GPU resources within a shared GPU cluster.
Device selector env variables
The environment variable ONEAPI_DEVICE_SELECTOR
may be used to restrict the
set of devices that can be used. For example, to only allow devices exposed by
the DPC++ HIP plugin, set the following:
export ONEAPI_DEVICE_SELECTOR="hip:*"
To only allow a subset of devices from the hip backend use a comma separated list, e.g.:
export ONEAPI_DEVICE_SELECTOR="hip:1,3"
Then the following will populate Devs
with the two AMD devices only:
std::vector<sycl::device> Devs;
for (const auto &plt : sycl::platform::get_platforms()) {
if (plt.get_backend() == sycl::backend::ext_oneapi_hip) {}
Devs=plt.get_devices();
break;
}
}
Such that Devs[0]
and Devs[1]
will correspond to the devices marked 1
and
3
respectively, when a user invokes rocm-smi
.
For more details covering ONEAPI_DEVICE_SELECTOR
, see the
Environment Variables section of the oneAPI DPC++ Compiler documentation.
In the case that only AMD devices are exposed to DPC++ the above described
usage of ONEAPI_DEVICE_SELECTOR
is equivalent to setting the
HIP_VISIBLE_DEVICES
environment variable:
export HIP_VISIBLE_DEVICES=1,3
In this circumstance that only AMD GPUs are available, an identical list can be populated in a simpler way:
std::vector<sycl::device> Devs =
sycl::device::get_devices(sycl::info::device_type::gpu);
Then Devs[0]
and Devs[1]
will correspond to the devices marked 1
and 3
by rocm-smi
.