Template Functions

Decprecated: Please note that you are viewing a guide targeting an older version of ComputeCpp Community Edition. This version of the guide is designed specifically for version 1.1.2.

SYCL allows us to easily leverage the power of C++ template metaprogramming in device code. Most valid compile-time constructs will work in a kernel (with some restrictions, described in "Limitations"). This means that we can define generic command groups, kernels, use functional programming concepts, and offload a lot of work to the compiler.

In this section we will create and execute a generic kernel function object

Define ConstantAdder

  template<typename T, typename Acc, size_t N>
  class ConstantAdder {
    ConstantAdder(Acc accessor, T val)
      : accessor(accessor)
      , val(val) {}

    void operator() () {
      for (size_t i = 0; i < N; i++) {
         accessor[i] += val;

    Acc accessor;
    const T val;

We first define a ConstantAdder class. The class overloads operator(), so it is a function object. It performs addition of a value to every element of an object, given that the object provides operator[].

Execute generic function

  #include <iostream>
  #include <array>
  #include <algorithm>

  #include <CL/sycl.hpp>

  namespace sycl = cl::sycl;

  <<Define ConstantAdder>>

  int main(int, char**) {
    std::array<int, 4> vals = {{ 1, 2, 3, 4 }};

    sycl::queue queue(sycl::default_selector{});
      sycl::buffer<int, 1> buf(vals.data(), sycl::range<1>(4));
      queue.submit([&] (sycl::handler& cgh) {
      auto acc = buf.get_access<sycl::access::mode::read_write>(cgh);

      cgh.single_task(ConstantAdder<int, decltype(acc), 4>(acc, 1));
    } );

    std::for_each(vals.begin(), vals.end(), [] (int i) { std::cout << i << " "; } );
    std::cout << std::endl;

    return 0;

In main, after the typical setup, we create a single buffer to manage four integer values. Note that we can use int, which is an unsized type - its width can differ from platform to platform. It is, however, safe to use, since the size is decided by the host compiler. The device compiler will then just use the same size, which allows for safe transfers of data between the host and devices.

In the command group, we create an instance of our ConstantAdder with a value of 1 and pass that to single_task. Notice that since our function object is now a full-fledged class, we do not have to make up an artificial template parameter, since the kernel name is known - it is exacly the name of the function object type ConstantAdder. The output of the program is as expected:

2 3 4 5

Of course, in this case a simple lambda kernel would have worked just as well. In general though, using your own classes gives you more flexibility and expressive power, at the cost of verbosity.

Select a Product

Please select a product

ComputeCpp enables developers to integrate parallel computing into applications using SYCL™ and accelerate code on a wide range of OpenCL™ devices such as GPUs.

ComputeSuite for R-Car enables developers to accelerate their applications on Renesas® R-Car based hardware such as the V3M and V3H, using the widely supported open standards SYCL and OpenCL.

Network Icon


part of our network