Version Latest


A SYCL application is effectively an OpenCL application, so any mechanism to profile or debug OpenCL applications can be used with any SYCL application. Refer to the OpenCL Debugging and Profiling section for details on how to profile OpenCL applications. This section focus on SYCL-specific debugging and profiling techniques.

Debugging SYCL applications

The debugging of kernels running on the device requires support from the underlying platform. Once OpenCL kernel debug support is added in future devices, it will be available for SYCL kernels too.

However, SYCL developers can use the host device to debug their applications and kernels on the host device using standard debugging tools, such as gdb. This can be done by replacing the device queue with a queue on a host_device.

A host queue can be created on the host simply by creating a queue from a host device. An option to enable simple debugging on the host is to use a pre-processor macro when creating a custom debug build, as shown below:

// Construct a host device
host_device hd;
// Create a queue with a host device
queue myQueue(hD);
// Use the default selector of the platform
queue myQueue;

The rest of the code remains unchanged.

Using the same mechanism, any C++ checking tool can be used to inspect the behavior of the application on the host. For example, valgrind can be used to detect array out of bounds access when using the host device without requiring special configuration options.

Profiling SYCL applications

Manual profiling using SYCL events

The submit method of the queue in SYCL returns SYCL event objects on submission. These events contain additional OpenCL event objects which can be used to obtain profiling information to measure the execution time of a command. To capture profiling information for OpenCL commands associated with events, we need to create a SYCL queue with property::queue::enable_profiling() as property_list. The profiling operations are available for memory objects and kernels. Details of OpenCL commands that return events with profiling information can be found in Section 5.12 of the OpenCL 1.2 specification.

The example below illustrates the usage of OpenCL profiling events for SYCL applications to extract the execution time of a kernel enqueued to a device.

// Initialize property list with profiling info
property_list propList = {property::queue::enable_profiling(), /* ... */};

// Build sycl queue with profiling info enabled
queue myQueue(/* ... */, propList);

// Submit a kernel to the queue, returns a SYCL event
event myEvent = myQueue.submit([&](handler& cgh) {
  // ...


// Query submit time.
auto submit =
// Query start time.
auto start =

// Query end time.
auto end = myEvent.get_profiling_info<info::event_profiling::command_end>();

// Duration of the kernel execution in nano seconds
auto duration = start - end;

The get_profiling_info methods of the SYCL event class allow developers to retrieve the different timestamps from the device. They are equivalent to their OpenCL counterparts. Note these methods are still valid on host queues so the time of execution on the host can also be analyzed.

OpenCL Remote profiling using CodeXL

Codeplay, in the context of the LPGPU2 project, has extended the AMD CodeXL profiler to support non-AMD devices. Using the remote profiling capabilities of CodeXL, it is possible to execute and retrieve the execution trace of an application and visualize it on the graphical tool. To launch the tracing, first execute the remote agent on the board:

$ /path/to/codexl/CodeXLRemoteAgent-bin

and then execute CodeXL on your desktop PC. From the CodeXL GUI tool bar menu, select File/New Project (ctrl N) In the new window choose the Remote Host radio button Set the host name or the IP address as Remote Host Address. In this screen it is also possible to set the execution path to the location on the board where the OpenCL application binary is available. Selecting Profile and then Application Time-trace enables the OpenCL tracing view. Hitting the green arrow (Play) will trigger the execution of the binary in the board. The remote agent will collect all the information and send it to the desktop client. Finally, the desktop client will visualize all the information in the display. Details on how to use CodeXL or on the CodeXL port from Codeplay can be found in the LPGPU2 blog post announcement. Displaying SYCL-specific profiling information requires a version of ComputeCpp with profiling support enabled. At the time of writing, both the profiler tool and the ComputeCpp version with profiling support can only be obtained directly from Codeplay support.

Figure @fig:2 and figure @fig:3 represent the codeXL output for the BabelStream benchmark on the R-Car V3M CVengine and R-Car V3H CVengine respectively.

Profling the BabelStream benchmark on V3M device

Profling the BabelStream benchmark on V3H device


    Select a Product

    Please select a product

    ComputeCpp enables developers to integrate parallel computing into applications using SYCL and accelerate code on a wide range of OpenCL devices such as GPUs.

    ComputeSuite for R-Car enables developers to accelerate their applications on a wide range of Renesas R-Car based hardware such as the H3 and V3M, using widely supported open standards such as Khronos SYCL and OpenCL.


    part of our network