Version Latest

Configuration file

ComputeCpp Configuration File

Using the configuration file

A configuration file may be used to configure the behavior of several aspects of the ComputeCpp runtime.

WARNING: Please note that ComputeCpp is designed to work out-of-the-box, so a configuration file is not required. Also, note that some configuration options are highly experimental and may cause some undesired behavior in your SYCL application.

Some configuration options are only available in the ComputeCpp Professional Edition.

How to set a configuration file

To set a configuration file follow these steps:

  1. Create a file in your system that the application can read
  2. Fill the configuration file with options as specified below
  3. Point the environment variable COMPUTECPP_CONFIGURATION_FILE to the newly created file

The following are one-liner examples on how to create and set the configuration file:

Windows:

type NUL > sycl_config.txt; set COMPUTECPP_CONFIGURATION_FILE=%CD%\sycl_config.txt

Linux:

touch sycl_config.txt; export COMPUTECPP_CONFIGURATION_FILE=`pwd`/sycl_config.txt

Note that if COMPUTECPP_CONFIGURATION_FILE points to a file that doesn't exist, the runtime will throw a synchronous exception when trying to parse it, which happens when the first SYCL object is created.

Example of a configuration file with all default options

The following example shows a configuration file with all default options available. Using the following configuration file won't change the default behavior of ComputeCpp but is a good summary of all available options.

verbose_output = false
use_memory_mapping = true
enable_spirv_check = false
yield_scheduler_loop = true
enable_profiling = false
enable_kernel_profiling = true
enable_buffer_profiling = false
profiling_collapse_transactions = false
force_queue_finish = false
submit_memory_check = false
lock_memory_tracking = false
kernel_checks = false
host_ptr_flag = automatic
page_size = 4096
cache_line_size = 64
max_running_transactions = 0
scheduler_sleep_time_ns = 0
enable_autobinding = false
secondary_queue_mode = none
flushing_policy = on_submit

Configuration options

The options in the configuration file are key value pairs in the format [option] = [value] where option is the name of the configuration option and value is the corresponding value.

Each option has a default value that the runtime will use if no value is specified for that option. Note that invalid options will be ignored.

verbose_output

  • Type: bool
  • Default: false

Enables the runtime verbose output mode. The runtime will print debug information, including but not limited to: * The selected device * Details of a thrown exception * what() * The location of the throw

use_memory_mapping

  • Type: bool
  • Default: true
  • Professional Edition Feature

When moving data to and from devices, ComputeCpp will map memory buffers in the host to manipulate its contents. This is the best option for most platforms but it might be better to use explicit data movement through the use of reads and writes. Disabling this option will force ComputeCpp to perform explicit reads and writes instead of mapping.

enable_spirv_check

  • Type: bool
  • Default: false
  • Professional Edition Feature

If enabled, ComputeCpp will try to perform additional checks on whether or not SPIR-V is supported by a device. This includes checking for the extension cl_khr_il_program in devices that don't support OpenCL 2.1 and checking for the CL_DEVICE_IL_VERSION. SPIR-V might be supported by platforms that don't report the cl_khr_il_program extension hence the check is disabled by default. Enabling this will cause the runtime to throw an exception if spirv64 is used in a device that doesn't report SPIR-V support.

yield_scheduler_loop

ComputeCpp has an advanced command group scheduler that uses its own thread to schedule work created by the user. The scheduler's thread will, by default, call std::this_thread::yield in each iteration. The behavior of this function is implementation-defined but it should be a hint to the OS that other threads may perform some work and this thread should be rescheduled. The call to the yield function can be disabled using this configuration option.

enable_profiling

  • Type: bool
  • Default: false
  • Professional Edition Feature

Enables the ComputeCpp embedded profiler that records a timeline of events that can help identify performance issues. Enabling this option will output a JSON file that can be visualized in Chrome or Chromium by opening [chrome://tracing] and loading the file. Note that the profiling output will not be written until the application exits.

For more information please refer to Profiling SYCL applications.

enable_kernel_profiling

  • Type: bool
  • Default: true
  • Professional Edition Feature
  • This option only has an effect when enable_profiling is enabled.

When both enable_profiling and enable_kernel_profiling are set to true, the ComputeCpp profiler will record kernel timings. This works by creating OpenCL queues with the property CL_QUEUE_PROFILING_ENABLE, which might affect the behavior of the underlying OpenCL queue in some platforms. Recording kernel timings can then be disabled by turning off this option.

enable_buffer_profiling

  • Type: bool
  • Default: false
  • Professional Edition Feature
  • This option only has an effect when enable_profiling is enabled.

ComputeCpp's profiler will record events on which buffers are used by each kernel. This allows the profiler to display the size and usage of each buffer as well as where each one was used. For small applications this is useful but when applications create thousands of buffers the interface can become difficult to handle so this option is turned off by default.

profiling_collapse_transactions

  • Type: bool
  • Default: false
  • Professional Edition Feature
  • This option only has an effect when enable_profiling is enabled.

The unit of work in ComputeCpp is known as a transaction. A transaction consists of the statements written inside a command_group submission. Each transaction is treated as an atomic operation by the runtime but a transaction goes through several stages during its lifetime. By default, the ComputeCpp profiler will record each stage of a transaction as a separate event. This is useful to identify bottlenecks in the transaction scheduling mechanism as one stage might be taking more time than it should due to an unsatisfied requisite. For very large applications, when the number of submissions is on the order of tens of thousands, it might be useful to represent all the individual transaction states as just a single event, which can be done by enabling this option.

force_queue_finish

  • Type: bool
  • Default: false
  • Professional Edition Feature

By default, ComputeCpp will call clFlush after each submit to ensure execution can start as soon as possible. However, some platforms require clFinish to be called or the work submitted will take a long time to start being executed so this option is provided to handle this situation. Note that clFinish is a blocking call that will wait for all the work in the queue to finish before returning. In the majority of scenarios, this option should not be enabled.

submit_memory_check

  • Type: bool
  • Default: false
  • Professional Edition Feature

ComputeCpp will track the amount of memory allocated in a device and will throw an exception if the device isn't available to allocate the memory. By enabling this option, the ComputeCpp runtime will perform extra checks before the allocation takes place, and throw a different exception with a more specific error message. However, this option is not enabled by default because the checks don't take into account virtual memory, so the checks might be too restrictive.

lock_memory_tracking

  • Type: bool
  • Default: false
  • Professional Edition Feature
  • This option only has an effect when submit_memory_check is enabled

When submit_memory_check is used in the context of multithreaded applications it might be difficult to identify which submission is causing the device to run out of memory. Enabling this option will force the runtime to block the submission of work while these checks are being performed. It will cause a performance degradation, but it might make it easier to find the problematic submission so that the application can be refactored.

kernel_checks

  • Type: bool
  • Default: false
  • Professional Edition Feature
  • WARNING: Highly experimental feature
  • Enabling this option will implicitly enable verbose_output

Debugging compute kernels is a difficult process. To facilitate this, ComputeCpp has the capability to verify if a kernel is accessing memory outside of its available bounds and will try to report information to help developers identify where the problem is. Enabling this option will cause all memory accesses inside a kernel to be validated and will add a synchronization point after each submission. The verification needs to wait for a kernel to finish before executing the next kernel. This is a debugging feature and will cause large performance degradation, and as such should be used with caution.

host_ptr_flag

  • Type: multiple choice
  • Default: automatic
  • Professional Edition Feature

This option controls which host pointer flag to use when creating buffers. More information on this flag can be found in clCreateBuffer. The possible values are:

  • automatic: The runtime is allowed to choose the most appropriate flag.
  • none: Do not use any flag.
  • cl_alloc_host_ptr: Force the use of CL_MEM_ALLOC_HOST_PTR.
  • cl_use_host_ptr: Force the use of CL_MEM_USE_HOST_PTR.

page_size

  • Type: size_t
  • Default: 4096
  • Professional Edition Feature
  • See host_ptr_flag

Controls the page size used by ComputeCpp's default_allocator, the allocator used to allocate memory for buffers and images.

When CL_MEM_USE_HOST_PTR is used the application tells the OpenCL runtime that the memory referenced by the host pointer should be used as storage for the OpenCL memory object. In some implementations, using this option can cause data to be pinned so data movement between host and device is very efficient, however, one of the requirements is that memory needs to be aligned with the page size the device uses to transfer data. If the page size for a device is not equal to 4096, this option can be used to change it.

cache_line_size

  • Type: size_t
  • Default: 64
  • Professional Edition Feature
  • See host_ptr_flag

Controls the size multiplier used by the ComputeCpp default_allocator to allocate memory. This value should, ideally, be the size of the cache line of a given compute unit so memory allocation can be properly aligned.

max_running_transactions

  • Type: unsigned int
  • Default: 0
  • Professional Edition Feature

Controls the maximum number of simultaneous running transactions ComputeCpp is allowed to have. 0 means the runtime is allowed to choose this number. Setting this number to 1, for instance, means the runtime will wait for every kernel to finish before submitting another.

This number can be interpreted as the maximum number of command groups the runtime will submit to the OpenCL driver before waiting for others to finish. Note that while a queue only executes one kernel at a time, depending on the platform and device, the driver might lose performance when several kernels are submitted in a short period of time so this option can be used to limit the number of simultaneous submissions.

scheduler_sleep_time_ns

  • Type: unsigned long long
  • Default: 0
  • Professional Edition Feature
  • WARNING: Highly experimental feature

ComputeCpp's command group scheduler uses a thread to process incoming transactions. This thread uses, by default, a polling mechanism to see if the user has submitted a command group to be executed. When the application targets an accelerator device, like a GPU, this mechanism works well because the CPU is free to run the scheduler thread as fast as it can. In platforms where the device is the CPU itself, setting a sleep time for the scheduler thread can have performance benefits and can reduce power consumption as well.

The polling mechanism may not be suitable for all platforms, in particular, having a thread looping all the time will keep one CPU core at maximum utilization. This configuration option can be used to specify a time duration that the scheduler thread is allowed to sleep in each iteration. Note that the thread will not always sleep for the specified time as it will be awakened as soon as the application submits a new command group to be processed and also when a command group finishes work.

It is recommended to enable this option when running on a CPU device, whereas for a GPU or accelerator device it would depend on the execution time of the kernels involved. An application with larger kernels would benefit less from this option or might even see a performance degradation. It is also recommended to turn off yield_scheduler_loop when using the sleep time.

enable_autobinding

  • Type: bool
  • Default: false
  • Professional Edition Feature

WARNING: The autobinding feature is not compatible with the host device.

Auto binding is a feature that allows ComputeCpp to automatically bind placeholder accessors to command groups. A placeholder accessor needs to be bound to a command group by calling cl::sycl::handler::require inside the command group. By turning on this feature ComputeCpp can automatically bind placeholder accessors that a command group requires.

secondary_queue_mode

  • Type: multiple choice
  • Default: none
  • Professional Edition Feature

For each queue created, ComputeCpp can create a secondary queue that can be used according to the policy defined by this configuration option. The secondary queue can be used to automatically overlap data movement and computation. Note that the secondary queue will be created for the same device, thus, it might not be beneficial to use a secondary queue in all platforms. This option can be set to one of the following:

  • none: No secondary queue is created.
  • data_ops: The secondary queue is allowed to take compute tasks from the primary queue.
  • data_steal: The secondary queue is allowed to take memory movement operations from the primary queue.
  • any_steal: The secondary queue is allowed to take any task from the primary queue.

flushing_policy

  • Type: multiple choice
  • Default: on_submit
  • Professional Edition Feature
  • WARNING: Highly experimental feature

ComputeCpp will call a clFlush operation after each command group to ensure that the OpenCL driver starts to execute a command group as soon as it can. Even though this is an asynchronous call, clFlush can cause some OpenCL drivers to start execution immediately, which effectively makes subsequent submissions wait before starting the current submission. This can also cause performance degradation if a large number of small kernels are being submitted, as some drivers prefer to have several submissions before a flush to schedule work to make full use of a device.

The flushing policy can be one of the following:

  • on_submit: Always perform a clFlush after each command group submission.
  • on_wait: Only perform a clFlush when necessary. clFlush is called when the user waits on some runtime event. This wait can be an explicit call to cl::sycl::queue::wait, cl::sycl::event::wait, a cl::sycl::buffer destruction, cl::sycl::stream operations, and when submissions are made to different queues. The runtime will try to flush only when required to avoid excessive calls to clFlush.

use_out_of_order_queue

  • Type: bool
  • Default: false
  • Professional Edition Feature

ComputeCpp uses an in-order OpenCL queue. This option enables the runtime to use an out-of-order queue instead, if the device supports it.

https://www.khronos.org/registry/OpenCL/sdk/2.1/docs/man/xhtml/clGetDeviceInfo.html

https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clEnqueueMapBuffer.html

https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clEnqueueReadBuffer.html

https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clFlush.html

https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clFinish.html

https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateBuffer.html

https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateCommandQueue.html

Sections

    Select a Product

    Please select a product

    ComputeCpp enables developers to integrate parallel computing into applications using SYCL and accelerate code on a wide range of OpenCL devices such as GPUs.

    ComputeSuite for R-Car enables developers to accelerate their applications on a wide range of Renesas R-Car based hardware such as the H3 and V3M, using widely supported open standards such as Khronos SYCL and OpenCL.

    Also,

    part of our network