ComputeCpp Profiler

Profiling ComputeCpp Applications

ComputeCpp Professional Edition has built-in support to output a JSON file that contains profiling information of the runtime and the underlying OpenCL implementation. This JSON file is conformant with the Chrome Trace Events format and can be loaded in any Chrome installation by accessing the URL chrome://tracing.

The objective of this document is to describe how to activate profiling for a ComputeCpp application using the configuration file.

Enabling the JSON profiler backend

A configuration file may be used to configure how the ComputeCpp runtime will perform certain operations.

In order to enable profiling of the runtime, create a file and add the option enable_profiling = true. Point an environment variable called COMPUTECPP_CONFIGURATION_FILE to this file and run the application. The following is a single command to create a file called sycl_config.txt and export the environment variable to enable profiling:

  • Windows:
echo "enable_profiling = true" > sycl_config.txt; set COMPUTECPP_CONFIGURATION_FILE=sycl_config.txt
  • Linux:
echo "enable_profiling = true" > sycl_config.txt; export COMPUTECPP_CONFIGURATION_FILE=sycl_config.txt

Note that if COMPUTECPP_CONFIGURATION_FILE points to a file that doesn't exists, the runtime will fail to initialize.

Configuring the profiler

The following is a list of all the configuration options available to change how the profiler backend behaves:

Configuration Option Type Default Value Supported Profiling Backends Description
enable_profiling boolean false all Enables or disables profiling.
enable_perf_counters_profiling boolean false all Enables the capture of performance counter data the selected performance counter backend.
perf_counter_backend multiple choices default all Specifies the performance counter backend to use. Supported platforms are Intel GPU devices (intel_gpu) and ARM Mali devices (arm_mali). The default option selects ARM Mali performance counters on ARM platforms and Intel GPU performance counters otherwise.
enable_json_profiling boolean true json Enables or disables the JSON profiling backend.
enable_tracy_profiling boolean false tracy Enables of disables the Tracy profiling backend.
enable_kernel_profiling boolean true json Enables or disables profiling kernels running on a device. It will prevent injection of the property::queue::enable_profiling which could be useful when profiling some platforms.
profiling_collapse_transactions boolean false json Enabling this will cause all states of a transaction to be collapsed into a single entry. For long running applications this can be useful to reduce the size of the json file.
enable_buffer_profiling boolean true json Disabling this will prevent the json profiler to capture events on buffer. This can be useful when the application creates a large number of buffers and do not reuse them. It simplifies the json and allows it to be loaded more quickly.

Profiling output

By default, when the application finishes, the runtime will write the JSON file in the current working directory, usually the same directory as the binary of the application, in the format [executable_name]_[current_date].json.

This behaviour can be changed by setting the environment variable COMPUTECPP_PROFILING_OUTPUT. If this is set, the runtime will use the value of this variable as the output file. The file doesn't need to exist but the application must have permissions to create and write it.

Performance Counters for Intel GPUs

When running ComputeCpp applications in Intel GPU's that match one of the following architectures, the profiler can display performance counters for the ComputeBasic metrics set.

  • Intel(R) Processors with Gen11 graphics devices (formerly Icelake),
  • Intel(R) Processors with Gen9 graphics devices (formerly Skylake, Kaby Lake, Apollo Lake/Broxton, Gemini Lake, Coffee Lake),
  • Intel(R) Processors with Gen8 graphics devices (formerly Broadwell),
  • Intel(R) Processors with Gen7.5 graphics devices (formerly Haswell).

No extra configuration is required, as long as the system supports Intel's Metrics Discovery API.

Performance Counters for ARM Mali

When running ComputeCpp applications on an ARM Mali device with either the Midgard, Valhall, or Bifrost architecture, the profiler can display a set of performance counters.

Example

The following is an example of the Google Chrome Visualizer that displays a execution of BabelStream in an Intel GPU.

Note at the bottom the performances counters. Many more counters are available and they can be accessed by scrolling the window.

Chrome JSON profiler Visualizer

User Defined Profiling Zones

This is a Professional Edition feature. Using codeplay::profiling::profiling_zone has no effect in ComputeCpp CE.

ComputeCpp supports user-defined profiling zones. This feature allows developers to annotate the source code and display extra information in the profiler.

This can be used to mark sections of interest to be displayed as API calls.

To use this feature, just include SYCL/codeplay/profiling_user.h and start annotating the source code. Annotations can be created anywhere using the cl::sycl::codeplay::profiling::profiling_zone object. Refer to example for more information on the usage.

Here is a simple example:

#include <CL/sycl.hpp>

#include <SYCL/codeplay/profiling_user.h>

#include <array>
#include <iostream>

using namespace cl::sycl;

constexpr access::mode sycl_read = access::mode::read;
constexpr access::mode sycl_write = access::mode::write;

template <typename T>
class SimpleVadd;

template <typename T, size_t N>
void simple_vadd(const std::array<T, N>& VA, const std::array<T, N>& VB,
                 std::array<T, N>& VC) {
  // Creates a zone that spawns the execution of the simple_vadd function
  codeplay::profiling::profiling_zone simpleVAddZone("simple_vadd");
  queue deviceQueue;
  range<1> numOfItems{N};
  buffer<T> bufferA(VA.data(), numOfItems);
  buffer<T> bufferB(VB.data(), numOfItems);
  buffer<T> bufferC(VC.data(), numOfItems);

  deviceQueue.submit([&](cl::sycl::handler& cgh) {
    // This allows you to know exactly when the submit happened
    codeplay::profiling::profiling_zone submitVAddZone("submit simple_vadd");
    auto accessorA = bufferA.template get_access<sycl_read>(cgh);
    auto accessorB = bufferB.template get_access<sycl_read>(cgh);
    auto accessorC = bufferC.template get_access<sycl_write>(cgh);

    auto kern = [=](id<1> wiID) {
      accessorC[wiID] = accessorA[wiID] + accessorB[wiID];
    };
    cgh.parallel_for<class SimpleVadd<T>>(numOfItems, kern);
  });
}

int main() {
  // Creates a zone that spawns the whole application
  codeplay::profiling::profiling_zone mainZone("main");
  const size_t array_size = 4;
  std::array<cl_int, array_size> A = {{1, 2, 3, 4}}, B = {{1, 2, 3, 4}}, C;
  std::array<cl_float, array_size> D = {{1.f, 2.f, 3.f, 4.f}},
                                   E = {{1.f, 2.f, 3.f, 4.f}}, F;
  simple_vadd(A, B, C);
  simple_vadd(D, E, F);
}

Running this application with the JSON profiler enabled will produce the following output.

WARNING: User defined zones inside kernels are not supported at the moment.

user_zones

Note how the user-defined zones are naturally blended with other API calls. This allows grouping regions of interested.

A zone is created with an object of class codeplay::profiling::profiling_zone is created and closed when the object is destroyed.

User zones can be nested. The following example creates four user-defined zones, however, zones 2 and 4 are nested under zone 1 and zone 3 is nested under zone 2.

profiling_zone zone1("zone1");
{
  profiling_zone zone2("zone2");
  {
    profiling_zone zone3("zone3");
  }
  profiling_zone zone4("zone4");
}

All profilers in ComputeCpp support this feature.

Sections

    Select a Product

    Please select a product

    ComputeCpp enables developers to integrate parallel computing into applications using SYCL and accelerate code on a wide range of OpenCL devices such as GPUs.

    ComputeSuite for R-Car enables developers to accelerate their applications on Renesas R-Car based hardware such as the V3M and V3H, using the widely supported open standards SYCL and OpenCL.

    Also,

    part of our network