info

Please note that you are viewing a guide targeting an older version of ComputeCpp™ Community Edition. This guide was designed for version 2.10.0.

Configuration file

link

ComputeCpp Configuration File

link

Using the configuration file

A configuration file may be used to configure the behavior of several aspects of the ComputeCpp runtime.

WARNING: Please note that ComputeCpp is designed to work out-of-the-box, so a configuration file is not required. Also, note that some configuration options are highly experimental and may cause some undesired behavior in your SYCL application.

Some configuration options are only available in the ComputeCpp Professional Edition.

link

How to set a configuration file

To set a configuration file follow these steps:

Create a file in your system that the application can read
Fill the configuration file with options as specified below
Point the environment variable COMPUTECPP_CONFIGURATION_FILE to the newly created file

The following are one-liner examples on how to create and set the configuration file:

link

Linux:

touch sycl_config.txt; export COMPUTECPP_CONFIGURATION_FILE=`pwd`/sycl_config.txt

Note that if COMPUTECPP_CONFIGURATION_FILE points to a file that doesn't exist the runtime will output a warning and continue using the defaults shown below.

Reading the configuration file happens at program initialization, before entering main.

link

Example of a configuration file with all default options

The following example shows a configuration file with all default options available. Using the following configuration file won't change the default behavior of ComputeCpp but is a good summary of all available options.

verbose_output = false
use_memory_mapping = true
enable_spirv_check = false
yield_scheduler_loop = true
enable_profiling = false
enable_json_profiling = true
enable_tracy_profiling = false
enable_kernel_profiling = true
enable_buffer_profiling = false
enable_perf_counters_profiling = false
profiling_collapse_transactions = false
force_queue_finish = false
submit_memory_check = false
lock_memory_tracking = false
kernel_checks = false
host_ptr_flag = automatic
page_size = 4096
cache_line_size = 64
max_running_transactions = 0
scheduler_sleep_time_ns = 0
enable_autobinding = false
secondary_queue_mode = none
flushing_policy = on_submit
use_out_of_order_queue = false
reduction_workgroup_size = 0

link

Configuration options

The options in the configuration file are key value pairs in the format [option] = [value] where option is the name of the configuration option and value is the corresponding value.

Each option has a default value that the runtime will use if no value is specified for that option. Note that invalid options will be ignored.

link

verbose_output

Type: bool
Default: false

Enables the runtime verbose output mode. The runtime will print debug information, including but not limited to: * The selected device * Details of a thrown exception * what() * The location of the throw

link

use_memory_mapping

Type: bool
Default: true
Professional Edition Feature

When moving data to and from devices, ComputeCpp will map memory buffers in the host to manipulate its contents. This is the best option for most platforms but it might be better to use explicit data movement through the use of reads and writes. Disabling this option will force ComputeCpp to perform explicit reads and writes instead of mapping.

link

enable_spirv_check

Type: bool
Default: false
Professional Edition Feature

If enabled, ComputeCpp will try to perform additional checks on whether or not SPIR-V is supported by a device. This includes checking for the extension cl_khr_il_program in devices that don't support OpenCL 2.1 and checking for the CL_DEVICE_IL_VERSION. SPIR-V might be supported by platforms that don't report the cl_khr_il_program extension hence the check is disabled by default. Enabling this will cause the runtime to throw an exception if spirv64 is used in a device that doesn't report SPIR-V support.

link

yield_scheduler_loop

Type: bool
Default: true
Professional Edition Feature
See scheduler_sleep_time_ns

ComputeCpp has an advanced command group scheduler that uses its own thread to schedule work created by the user. The scheduler's thread will, by default, call std::this_thread::yield in each iteration. The behavior of this function is implementation-defined but it should be a hint to the OS that other threads may perform some work and this thread should be rescheduled. The call to the yield function can be disabled using this configuration option.

link

enable_profiling

Type: bool
Default: false
Professional Edition Feature

Enables ComputeCpp automatic profiling capabilities.

For more information please refer to Profiling SYCL applications.

link

enable_json_profiling

Type: bool
Default: true
Professional Edition Feature
This option only has an effect when enable_profiling is enabled.

Enables the ComputeCpp embedded profiler that records a timeline of events that can help identify performance issues. Enabling this option will output a JSON file that can be visualized in Chrome or Chromium by opening [chrome://tracing] and loading the file. Note that the profiling output will not be written until the application exits.

When profiling is enabled, the JSON profiler is enabled by default, however, this option can be used to disable it in favor of another profiler.

link

enable_tracy_profiling

Type: bool
Default: false
Professional Edition Feature
This option only has an effect when enable_profiling is enabled.

Tracy is not enabled by default in ComputeCpp and requires this option to be enabled to function. The profilers are not mutually exclusive, meaning that both enable_json_profiling and enable_tracy_profiler can be enabled at the same time. Enabling this option will allow the application to connect with the Tracy server so profiling can be visualized in real-time while the application is running.

Tracy support is still experimental so it is disabled by default.

link

enable_kernel_profiling

Type: bool
Default: true
Professional Edition Feature
This option only has an effect when enable_profiling is enabled.

When both enable_profiling and enable_kernel_profiling are set to true, the ComputeCpp profiler will record kernel timings. This works by creating OpenCL queues with the property CL_QUEUE_PROFILING_ENABLE, which might affect the behavior of the underlying OpenCL queue in some platforms. Recording kernel timings can then be disabled by turning off this option.

link

enable_buffer_profiling

Type: bool
Default: false
Professional Edition Feature
This option only has an effect when enable_profiling is enabled.

ComputeCpp's profiler will record events on which buffers are used by each kernel. This allows the profiler to display the size and usage of each buffer as well as where each one was used. For small applications this is useful but when applications create thousands of buffers the interface can become difficult to handle so this option is turned off by default.

link

enable_perf_counters_profiling

Type: bool
Default: false
Professional Edition Feature
This option only has an effect when enable_profiling is enabled.

ComputeCpp supports reading performance counters from a number of devices (see perf_counter_backend) but this feature is disabled by default. Enabling this option will enable performance counter data to be displayed in the profilers.

link

perf_counter_backend

Type: multiple choice
Default: default
Professional Edition Feature
This option only has an effect when enable_profiling and enabled_perf_counters_profiling are enabled.

ComputeCpp currently supports performance counters on Intel GPU devices, ARM Mali devices and devices supporting the OpenCL extension Codeplay Performance Counters. This option can be set to one of the following:

default: Collect ARM Mali performance counter if on an ARM platform, otherwise collect Intel GPU performance counters.
intel_gpu: Collect Intel GPU performance counters.
arm_mali : Collect ARM Mali performance counters.
codeplay : Collect Codeplay Performance Counters.

link

profiling_collapse_transactions

Type: bool
Default: false
Professional Edition Feature
This option only has an effect when enable_profiling is enabled.

The unit of work in ComputeCpp is known as a transaction. A transaction consists of the statements written inside a command_group submission. Each transaction is treated as an atomic operation by the runtime but a transaction goes through several stages during its lifetime. By default, the ComputeCpp profiler will record each stage of a transaction as a separate event. This is useful to identify bottlenecks in the transaction scheduling mechanism as one stage might be taking more time than it should due to an unsatisfied requisite. For very large applications, when the number of submissions is on the order of tens of thousands, it might be useful to represent all the individual transaction states as just a single event, which can be done by enabling this option.

link

force_queue_finish

Type: bool
Default: false
Professional Edition Feature

By default, ComputeCpp will call clFlush after each submit to ensure execution can start as soon as possible. However, some platforms require clFinish to be called or the work submitted will take a long time to start being executed so this option is provided to handle this situation. Note that clFinish is a blocking call that will wait for all the work in the queue to finish before returning. In the majority of scenarios, this option should not be enabled.

link

submit_memory_check

Type: bool
Default: false
Professional Edition Feature

ComputeCpp will track the amount of memory allocated in a device and will throw an exception if the device isn't available to allocate the memory. By enabling this option, the ComputeCpp runtime will perform extra checks before the allocation takes place, and throw a different exception with a more specific error message. However, this option is not enabled by default because the checks don't take into account virtual memory, so the checks might be too restrictive.

link

lock_memory_tracking

Type: bool
Default: false
Professional Edition Feature
This option only has an effect when submit_memory_check is enabled

When submit_memory_check is used in the context of multi-threaded applications it might be difficult to identify which submission is causing the device to run out of memory. Enabling this option will force the runtime to block the submission of work while these checks are being performed. It will cause a performance degradation, but it might make it easier to find the problematic submission so that the application can be refactored.

link

kernel_checks

Type: bool
Default: false
Professional Edition Feature
WARNING: Highly experimental feature
Enabling this option will implicitly enable verbose_output

Debugging compute kernels is a difficult process. To facilitate this, ComputeCpp has the capability to verify if a kernel is accessing memory outside of its available bounds and will try to report information to help developers identify where the problem is. Enabling this option will cause all memory accesses inside a kernel to be validated and will add a synchronization point after each submission. The verification needs to wait for a kernel to finish before executing the next kernel. This is a debugging feature and will cause large performance degradation, and as such should be used with caution.

link

host_ptr_flag

Type: multiple choice
Default: automatic
Professional Edition Feature

This option controls which host pointer flag to use when creating buffers. More information on this flag can be found in clCreateBuffer. The possible values are:

automatic: The runtime is allowed to choose the most appropriate flag.
none: Do not use any flag.
cl_alloc_host_ptr: Force the use of CL_MEM_ALLOC_HOST_PTR.
cl_use_host_ptr: Force the use of CL_MEM_USE_HOST_PTR.

link

page_size

Type: size_t
Default: 4096
Professional Edition Feature
See host_ptr_flag

Controls the page size used by ComputeCpp's default_allocator, the allocator used to allocate memory for buffers and images.

When CL_MEM_USE_HOST_PTR is used the application tells the OpenCL runtime that the memory referenced by the host pointer should be used as storage for the OpenCL memory object. In some implementations, using this option can cause data to be pinned so data movement between host and device is very efficient, however, one of the requirements is that memory needs to be aligned with the page size the device uses to transfer data. If the page size for a device is not equal to 4096, this option can be used to change it.

link

cache_line_size

Type: size_t
Default: 64
Professional Edition Feature
See host_ptr_flag

Controls the size multiplier used by the ComputeCpp default_allocator to allocate memory. This value should, ideally, be the size of the cache line of a given compute unit so memory allocation can be properly aligned.

link

max_running_transactions

Type: unsigned int
Default: 0
Professional Edition Feature

Controls the maximum number of simultaneous running transactions ComputeCpp is allowed to have. 0 means the runtime is allowed to choose this number. Setting this number to 1, for instance, means the runtime will wait for every kernel to finish before submitting another.

This number can be interpreted as the maximum number of command groups the runtime will submit to the OpenCL driver before waiting for others to finish. Note that while a queue only executes one kernel at a time, depending on the platform and device, the driver might lose performance when several kernels are submitted in a short period of time so this option can be used to limit the number of simultaneous submissions.

link

scheduler_sleep_time_ns

Type: unsigned long long
Default: 0
Professional Edition Feature
WARNING: Highly experimental feature

ComputeCpp's command group scheduler uses a thread to process incoming transactions. This thread uses, by default, a polling mechanism to see if the user has submitted a command group to be executed. When the application targets an accelerator device, like a GPU, this mechanism works well because the CPU is free to run the scheduler thread as fast as it can. In platforms where the device is the CPU itself, setting a sleep time for the scheduler thread can have performance benefits and can reduce power consumption as well.

The polling mechanism may not be suitable for all platforms, in particular, having a thread looping all the time will keep one CPU core at maximum utilization. This configuration option can be used to specify a time duration that the scheduler thread is allowed to sleep in each iteration. Note that the thread will not always sleep for the specified time as it will be awakened as soon as the application submits a new command group to be processed and also when a command group finishes work.

It is recommended to enable this option when running on a CPU device, whereas for a GPU or accelerator device it would depend on the execution time of the kernels involved. An application with larger kernels would benefit less from this option or might even see a performance degradation. It is also recommended to turn off yield_scheduler_loop when using the sleep time.

link

enable_autobinding

Type: bool
Default: false
Professional Edition Feature

WARNING: The autobinding feature is not compatible with the host device.

Auto binding is a feature that allows ComputeCpp to automatically bind placeholder accessors to command groups. A placeholder accessor needs to be bound to a command group by calling cl::sycl::handler::require inside the command group. By turning on this feature ComputeCpp can automatically bind placeholder accessors that a command group requires.

link

secondary_queue_mode

Type: multiple choice
Default: none
Professional Edition Feature

For each queue created, ComputeCpp can create a secondary queue that can be used according to the policy defined by this configuration option. The secondary queue can be used to automatically overlap data movement and computation. Note that the secondary queue will be created for the same device, thus, it might not be beneficial to use a secondary queue in all platforms. This option can be set to one of the following:

none: No secondary queue is created.
data_ops: The secondary queue is allowed to take compute tasks from the primary queue.
data_steal: The secondary queue is allowed to take memory movement operations from the primary queue.
any_steal: The secondary queue is allowed to take any task from the primary queue.

link

flushing_policy

Type: multiple choice
Default: on_submit
Professional Edition Feature
WARNING: Highly experimental feature

ComputeCpp will call a clFlush operation after each command group to ensure that the OpenCL driver starts to execute a command group as soon as it can. Even though this is an asynchronous call, clFlush can cause some OpenCL drivers to start execution immediately, which effectively makes subsequent submissions wait before starting the current submission. This can also cause performance degradation if a large number of small kernels are being submitted, as some drivers prefer to have several submissions before a flush to schedule work to make full use of a device.

The flushing policy can be one of the following:

on_submit: Always perform a clFlush after each command group submission.
on_wait: Only perform a clFlush when necessary. clFlush is called when the user waits on some runtime event. This wait can be an explicit call to cl::sycl::queue::wait, cl::sycl::event::wait, a cl::sycl::buffer destruction, cl::sycl::stream operations, and when submissions are made to different queues. The runtime will try to flush only when required to avoid excessive calls to clFlush.

link

use_out_of_order_queue

Type: bool
Default: false
Professional Edition Feature

ComputeCpp uses an in-order OpenCL queue. This option enables the runtime to use an out-of-order queue instead, if the device supports it.

link

reduction_workgroup_size

Type: size_t
Default: 0
Professional Edition Feature

Controls the workgroup size used for SYCL reduction operations. A value > 0 here will override the workgroup size calculated by the runtime. The value set here is not checked so may result in errors if set higher than your OpenCL implementation's maximum workgroup size. Reductions may also use some local memory so high workgroup sizes and kernels that use a lot of local memory may result in local memory exhaustion.

Rate this Guide

Logging

ComputeCpp Profiler

assignmentJump to Section

Configuration file
ComputeCpp Configuration File
Using the configuration file
How to set a configuration file
Example of a configuration file with all default options
Configuration options

menu_bookGuides

ComputeCpp Configuration File

Using the configuration file

How to set a configuration file

Windows:

Linux:

Example of a configuration file with all default options

Configuration options

verbose_output

use_memory_mapping

enable_spirv_check

yield_scheduler_loop

enable_profiling

enable_json_profiling

enable_tracy_profiling

enable_kernel_profiling

enable_buffer_profiling

enable_perf_counters_profiling

perf_counter_backend

profiling_collapse_transactions

force_queue_finish

submit_memory_check

lock_memory_tracking

kernel_checks

host_ptr_flag

page_size

cache_line_size

max_running_transactions

scheduler_sleep_time_ns

enable_autobinding

secondary_queue_mode

flushing_policy

use_out_of_order_queue

reduction_workgroup_size

Logging

ComputeCpp Profiler

assignmentJump to Section

Select a Product

oneAPI

Dark Mode

Light Mode

Codeplay.com

SYCL.tech

Codeplay Developer

Codeplay Open Source