SYCL allows us to easily leverage the power of C++ template metaprogramming in device code. Most valid compile-time constructs will work in a kernel (with some restrictions, described in "Limitations"). This means that we can define generic command groups, kernels, use functional programming concepts, and offload a lot of work to the compiler.
In this section we will create and execute a generic kernel function object
Define ConstantAdder
template<typename T, typename Acc, size_t N>
class ConstantAdder {
public:
ConstantAdder(Acc accessor, T val)
: accessor(accessor)
, val(val) {}
void operator() () {
for (size_t i = 0; i < N; i++) {
accessor[i] += val;
}
}
private:
Acc accessor;
const T val;
};
We first define a ConstantAdder
class. The class overloads operator()
, so it is a function object. It performs addition of a value to every element of an object, given that the object provides operator[]
.
Execute generic function
#include <iostream>
#include <array>
#include <algorithm>
#include <CL/sycl.hpp>
namespace sycl = cl::sycl;
<<Define ConstantAdder>>
int main(int, char**) {
std::array<int, 4> vals = {{ 1, 2, 3, 4 }};
sycl::queue queue(sycl::default_selector{});
{
sycl::buffer<int, 1> buf(vals.data(), sycl::range<1>(4));
queue.submit([&] (sycl::handler& cgh) {
auto acc = buf.get_access<sycl::access::mode::read_write>(cgh);
cgh.single_task(ConstantAdder<int, decltype(acc), 4>(acc, 1));
} );
}
std::for_each(vals.begin(), vals.end(), [] (int i) { std::cout << i << " "; } );
std::cout << std::endl;
return 0;
}
In main
, after the typical setup, we create a single buffer to manage four integer values. Note that we can use int
, which is an unsized type - its width can differ from platform to platform. It is, however, safe to use, since the size is decided by the host compiler. The device compiler will then just use the same size, which allows for safe transfers of data between the host and devices.
In the command group, we create an instance of our ConstantAdder
with a value of 1
and pass that to single_task
. Notice that since our function object is now a full-fledged class, we do not have to make up an artificial template parameter, since the kernel name is known - it is exactly the name of the function object type ConstantAdder
. The output of the program is as expected:
2 3 4 5
Of course, in this case a simple lambda kernel would have worked just as well. In general though, using your own classes gives you more flexibility and expressive power, at the cost of verbosity.
Lambdas can also be captured by the command group scope in SYCL. This is a useful feature, for example, when needed to pass a function pointer which is restricted in SYCL. Additionally, the C++14 generic lambdas
that improve on C++11 lambdas can also be used with ComputeCpp. If you are interested to learn more about this, you can have a look at our engineering blog post on Generic Lambdas with SYCL.
Note that there is no guarantee C++14 language features that are supported by ComputeCpp will also work across other SYCL implementations since the minimum requirement for SYCL v1.2.1 is C++11.