Version Latest

Tooling

Debugging OpenCL applications

There is no kernel debugging support for OpenCL applications on CVengine in the current version of ComputeAorta for R-Car. Debugging support for OpenCL kernels will be added in future releases.

Profiling OpenCL applications

We distinguish three mechanisms to profile OpenCL applications:

  1. Manual profiling using event profiling information on profiling queues
  2. Using CodeXL OpenCL profiling
  3. Using OpenCL trace intercept tools

Any of the above can be used effectively on the CVengine implementation. We focus on the first two. At the time of writing, there was no implementation available of an OpenCL trace intercept tool built for an ARM toolchain. However, once available, it should work without modification.

Manual profiling using OpenCL events

OpenCL event objects can be used to obtain profiling information that measures the execution time of a command. To capture profiling information for OpenCL commands associated with events, we need to create a command-queue with CL_QUEUE_PROFILING_ENABLE flag as properties argument to clCreateCommandQueue.

The profiling operations are available for memory objects and kernels. Details of OpenCL commands that return events with profiling information can be found in the Section 5.12 of the OpenCL 1.2 specification.

The example below illustrates the usage of OpenCL profiling events to extract the execution time of a kernel enqueued to a device.

// Declaration of an event to obtain performance information
cl_event perf_event;
// Store of the start and end timers
cl_ulong start = 0, end = 0;
// Create an OpenCL queue with the profiling information enabled
cl_command_queue cvEngineQueue = clCreateCommandQueue(context, CL_QUEUE_PROFILING_ENABLE, NULL);
// Enqueue a kernel as usual. The execution of a kernel can return a perf_event
clEnqueueNDRangeKernel(cvEngineQueue,…, &perf_event);
// Wait for the kernel execution event
clWaitForEvents(1, &perf_event);
// Obtain the timestamp of the start and end stages of the event
clGetEventProfilingInfo(perf_event, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &start, NULL);
clGetEventProfilingInfo(perf_event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL);

// End and Start are device-specific timestamps. Subtraction of the two returns the execution
// time from start to end. Resolution is nanoseconds. We multiply by 1e-06 to obtain milliseconds.
cl_double kernelExecutionTime = (cl_double)(end - start)*(cl_double)(1e-06); 

The getEventProfilingInfo function is used to obtain the timestamps for the event. Timestamps for the time it was enqueued CL_PROFILING_COMMAND_QUEUE and command submission CL_PROFILING_COMMAND_SUBMIT are also available.

OpenCL Remote profiling using CodeXL

Codeplay, in the context of the LPGPU2 project, has extended the AMD CodeXL profiler to support non-AMD devices. Using the remote profiling capabilities of CodeXL, it is possible to execute and retrieve the execution trace of an application and visualize it on the graphical tool. To launch the tracing, first execute the remote agent on the board:

$ /path/to/codexl/CodeXLRemoteAgent-bin

and then execute CodeXL on your desktop PC.

From the CodeXL GUI tool bar menu :

  • Select File/New Project (ctrl + N).

  • In the new window choose the Remote Host radio button.

  • Set the host name or the IP address as Remote Host Address.

  • In this screen it is also possible to set the execution path to the location one the board where the OpenCL application binary is available.

  • Selecting Profile and then Application Time-trace enables the OpenCL tracing view.

  • Hitting the green arrow (Play) will trigger the execution of the binary in the board.

The remote agent will collect all the information and send it to the desktop client. Finally, the desktop client will visualize all the information in the display. Details on how to use CodeXL or on the CodeXL port from Codeplay can be found in the LPGPU2 blog post announcement. At the time of writing, the profiler can only be obtained directly from Codeplay support.

The image below shows an example OpenCL profiling session using CodeXL to profile an OpenCL application on the R-Car V3M CVengine. The coloured boxed represent the different OpenCL operations and its duration. The detailed list of OpenCl operations can be seen in the box with the list. Instructions to use CodeXL are available on their website.

OpenCL CodeXL session for R-Car V3M CVengine

Sections

    Select a Product

    Please select a product

    ComputeCpp enables developers to integrate parallel computing into applications using SYCL and accelerate code on a wide range of OpenCL devices such as GPUs.

    ComputeSuite for R-Car enables developers to accelerate their applications on a wide range of Renesas R-Car based hardware such as the H3 and V3M, using widely supported open standards such as Khronos SYCL and OpenCL.

    Also,

    part of our network