SYCL allows us to easily leverage the power of C++ template metaprogramming in device code. Most valid compile-time constructs will work in a kernel (with some restrictions, described in "Limitations"). This means that we can define generic command groups, kernels, use functional programming concepts, and offload a lot of work to the compiler.
In this section we will create and execute a generic kernel function object
Define ConstantAdder
template<typename T, typename Acc, size_t N>
class ConstantAdder {
public:
ConstantAdder(Acc accessor, T val)
: accessor(accessor)
, val(val) {}
void operator() () {
for (size_t i = 0; i < N; i++) {
accessor[i] += val;
}
}
private:
Acc accessor;
const T val;
};
We first define a ConstantAdder
class. The class overloads operator()
, so it is a function object. It performs addition of a value to every element of an object, given that the object provides operator[]
.
Execute generic function
#include <iostream>
#include <array>
#include <algorithm>
#include <CL/sycl.hpp>
namespace sycl = cl::sycl;
<<Define ConstantAdder>>
int main(int, char**) {
std::array<int, 4> vals = {{ 1, 2, 3, 4 }};
sycl::queue queue(sycl::default_selector{});
{
sycl::buffer<int, 1> buf(vals.data(), sycl::range<1>(4));
queue.submit([&] (sycl::handler& cgh) {
auto acc = buf.get_access<sycl::access::mode::read_write>(cgh);
cgh.single_task(ConstantAdder<int, decltype(acc), 4>(acc, 1));
} );
}
std::for_each(vals.begin(), vals.end(), [] (int i) { std::cout << i << " "; } );
std::cout << std::endl;
return 0;
}
In main
, after the typical setup, we create a single buffer to manage four integer values. Note that we can use int
, which is an unsized type - its width can differ from platform to platform. It is, however, safe to use, since the size is decided by the host compiler. The device compiler will then just use the same size, which allows for safe transfers of data between the host and devices.
In the command group, we create an instance of our ConstantAdder
with a value of 1
and pass that to single_task
. Notice that since our function object is now a full-fledged class, we do not have to make up an artificial template parameter, since the kernel name is known - it is exacly the name of the function object type ConstantAdder
. The output of the program is as expected:
2 3 4 5
Of course, in this case a simple lambda kernel would have worked just as well. In general though, using your own classes gives you more flexibility and expressive power, at the cost of verbosity.