Template Functions

SYCL allows us to easily leverage the power of C++ template metaprogramming in device code. Most valid compile-time constructs will work in a kernel (with some restrictions, described in "Limitations"). This means that we can define generic command groups, kernels, use functional programming concepts, and offload a lot of work to the compiler.

In this section we will create and execute a generic kernel function object

Define ConstantAdder

  template<typename T, typename Acc, size_t N>
  class ConstantAdder {
  public:
    ConstantAdder(Acc accessor, T val)
      : accessor(accessor)
      , val(val) {}

    void operator() () {
      for (size_t i = 0; i < N; i++) {
         accessor[i] += val;
      }
    }

  private:
    Acc accessor;
    const T val;
  };

We first define a ConstantAdder class. The class overloads operator(), so it is a function object. It performs addition of a value to every element of an object, given that the object provides operator[].

Execute generic function

  #include <iostream>
  #include <array>
  #include <algorithm>

  #include <CL/sycl.hpp>

  namespace sycl = cl::sycl;

  <<Define ConstantAdder>>

  int main(int, char**) {
    std::array<int, 4> vals = {{ 1, 2, 3, 4 }};

    sycl::queue queue(sycl::default_selector{});
    {
      sycl::buffer<int, 1> buf(vals.data(), sycl::range<1>(4));
      queue.submit([&] (sycl::handler& cgh) {
      auto acc = buf.get_access<sycl::access::mode::read_write>(cgh);

      cgh.single_task(ConstantAdder<int, decltype(acc), 4>(acc, 1));
    } );
    }

    std::for_each(vals.begin(), vals.end(), [] (int i) { std::cout << i << " "; } );
    std::cout << std::endl;

    return 0;
  }

In main, after the typical setup, we create a single buffer to manage four integer values. Note that we can use int, which is an unsized type - its width can differ from platform to platform. It is, however, safe to use, since the size is decided by the host compiler. The device compiler will then just use the same size, which allows for safe transfers of data between the host and devices.

In the command group, we create an instance of our ConstantAdder with a value of 1 and pass that to single_task. Notice that since our function object is now a full-fledged class, we do not have to make up an artificial template parameter, since the kernel name is known - it is exacly the name of the function object type ConstantAdder. The output of the program is as expected:

2 3 4 5

Of course, in this case a simple lambda kernel would have worked just as well. In general though, using your own classes gives you more flexibility and expressive power, at the cost of verbosity.

Sections

    Select a Product

    Please select a product

    ComputeCpp enables developers to integrate parallel computing into applications using SYCL and accelerate code on a wide range of OpenCL devices such as GPUs.

    ComputeSuite for R-Car enables developers to accelerate their applications on a wide range of Renesas R-Car based hardware such as the H3 and V3M, using widely supported open standards such as Khronos SYCL and OpenCL.

    Also,

    part of our network