Template Functions

Templated function objects

SYCL allows us to easily leverage the power of C++ template metaprogramming in device code. Most valid compile-time constructs will work in a kernel (with some restrictions, described in "Limitations"). This means that we can define generic command groups, kernels, use functional programming concepts, and offload a lot of work to the compiler.

In this section we will create and execute a generic kernel function object

Define ConstantAdder

  template<typename T, typename Acc, size_t N>
  class ConstantAdder {
    ConstantAdder(Acc accessor, T val)
      : accessor(accessor)
      , val(val) {}

    void operator() () {
      for (size_t i = 0; i < N; i++) {
         accessor[i] += val;

    Acc accessor;
    const T val;

We first define a ConstantAdder class. The class overloads operator(), so it is a function object. It performs addition of a value to every element of an object, given that the object provides operator[].

Execute generic function

  #include <iostream>
  #include <array>
  #include <algorithm>

  #include <CL/sycl.hpp>

  namespace sycl = cl::sycl;

  <<Define ConstantAdder>>

  int main(int, char**) {
    std::array<int, 4> vals = {{ 1, 2, 3, 4 }};

    sycl::queue queue(sycl::default_selector{});
      sycl::buffer<int, 1> buf(vals.data(), sycl::range<1>(4));
      queue.submit([&] (sycl::handler& cgh) {
      auto acc = buf.get_access<sycl::access::mode::read_write>(cgh);

      cgh.single_task(ConstantAdder<int, decltype(acc), 4>(acc, 1));
    } );

    std::for_each(vals.begin(), vals.end(), [] (int i) { std::cout << i << " "; } );
    std::cout << std::endl;

    return 0;

In main, after the typical setup, we create a single buffer to manage four integer values. Note that we can use int, which is an unsized type - its width can differ from platform to platform. It is, however, safe to use, since the size is decided by the host compiler. The device compiler will then just use the same size, which allows for safe transfers of data between the host and devices.

In the command group, we create an instance of our ConstantAdder with a value of 1 and pass that to single_task. Notice that since our function object is now a full-fledged class, we do not have to make up an artificial template parameter, since the kernel name is known - it is exacly the name of the function object type ConstantAdder. The output of the program is as expected:

2 3 4 5

Of course, in this case a simple lambda kernel would have worked just as well. In general though, using your own classes gives you more flexibility and expressive power, at the cost of verbosity.

Lambdas can also be captured by the command group scope in SYCL. This is a useful feature, for example, when needed to pass a function pointer which is restricted in SYCL. Additionally, the C++14 generic lambdas that improve on C++11 lambdas can also be used with ComputeCpp. If you are interested to learn more about this, you can have a look at our engineering blog post on Generic Lambdas with SYCL.

Note that there is no guarantee C++14 language features that are supported by ComputeCpp will also work across other SYCL implementations since the minimum requirement for SYCL v1.2.1 is C++11.


    Select a Product

    Please select a product

    ComputeCpp enables developers to integrate parallel computing into applications using SYCL and accelerate code on a wide range of OpenCL devices such as GPUs.

    ComputeSuite for R-Car enables developers to accelerate their applications on Renesas R-Car based hardware such as the V3M and V3H, using the widely supported open standards SYCL and OpenCL.


    part of our network