List of SYCL extensions implemented in ComputeCpp. Some of these extensions have been proposed for future versions of SYCL and some of them have already made it in (though possibly slightly modified).
Some extensions have feature test macros that can be used to check whether the extension is available or not. If there are multiple iterations of an extension, the feature test macro will have a different value for each iteration.
constexpr id and range classes
- Feature test macro: N/A
- First implemented: Version 2.5.0
- SYCL version: 1.2.1 and higher
- Status: Proposed for SYCL post-2020
When compiling SYCL code (1.2.1 or higher) using C++14 or higher,
all operations on id
and range
class objects are automatically constexpr
.
Mark pointers of read-only accessors as const
- Feature test macro:
COMPUTECPP_EXT_READ_ACC_CONST_PTR
- Values:
202109
- First implemented: Version 2.7.0
- SYCL version: 1.2.1 and higher
- Status: Experimental
When the macro COMPUTECPP_EXT_READ_ACC_CONST_PTR_ENABLE
is defined for the device compiler,
the internal pointers of all read-only accessors become const
.
This can potentially lead to more efficient code generation.
This isn't enabled by default because the constness gets propagated and some existing code might break.
Make SYCL 2020's host_task
available in 1.2.1
- Feature test macro:
COMPUTECPP_EXT_2020_HOST_TASK
- Values:
202111
- First implemented: Version 2.8.0
- SYCL version: 1.2.1 and higher
- Status: Experimental
When the macro COMPUTECPP_EXT_2020_HOST_TASK_ENABLE
is defined,
the SYCL 2020 host_task implementation will also
be available in SYCL 1.2.1 mode.
This may provide some improvements over Codeplay's host_task
extension originally provided for SYCL 1.2.1, namely less overhead and compatibility
with the context_bound
property for buffers.
USM-Buffer interop
- Feature test macro:
SYCL_EXT_CODEPLAY_USM_BUFFER_INTEROP
- Values:
202207
- First implemented: Version 2.11.0
- SYCL version: 2020 and higher
- Status: Experimental
A buffer associated with a Unified Shared Memory (USM) pointer can be obtained using
template <typename T>
sycl::buffer<T> sycl::ext::codeplay::get_buffer(T* usmPointer, sycl::context& ctx)
where usmPointer
is a pointer to a valid USM allocation allocated in the ctx
context.
The returned buffer has the same access limitations as usmPointer
.
Additional limitations of the prototype are given in the Prototype limitations section.
The resulting buffer is then associated with usmPointer
,
persisting for the lifetime of the USM allocation.
Subsequent calls to get_buffer(...)
using the same usmPointer
argument will return the same buffer object.
The buffer object is freed when sycl::free(usmPointer, ctx)
is called.
For the prototype implementation,
the usmPointer
must point to the beginning of the USM allocation,
and be host-accessible (allocated with malloc_host
or malloc_shared
).
Capabilities
The capabilities of the extension are demonstrated in the following example:
// USM memory is allocated.
float* usmPtr = malloc_shared<float>(count, exampleQueue);
{
auto buf = ext::codeplay::get_buffer(usmPtr, exampleContext);
exampleQueue.submit([&](handler& cgh) {
auto acc = buf.get_access(cgh);
cgh.single_task<example_kernel<1>>([=]() {
acc[0] = 1;
});
});
}
{
// This buffer retrieves the same buffer object as
// for the previous kernel.
auto buf = ext::codeplay::get_buffer(usmPtr, exampleContext);
exampleQueue.submit([&](handler& cgh) {
auto acc = buf.get_access(cgh);
cgh.single_task<example_kernel<2>>([=]() {
acc[0] = 2;
});
});
}
{
// Host accessor.
auto buf = ext::codeplay::get_buffer(usmPtr, exampleContext);
auto accHost = buf.get_host_access();
accHost[0] = 3
}
{
// Update_host is a work-around to ensure copy back to USM memory in
// prototype.
auto buf = ext::codeplay::get_buffer(usmPtr, exampleContext);
exampleQueue.update_host(accessor(buf));
}
exampleQueue.wait_and_throw();
// usmPtr[0] == 3
free(usmPtr, exampleContext);
Here, the following occurs:
- A USM allocation is created using
malloc_shared
and pointed at byusmPtr
. - A buffer associated with the USM allocation is created with
get_buffer
and used inexample_kernel<1>
. - The buffer associated with the USM allocation is retrieved using
get_buffer
and used inexample_kernel<2>
. - The buffer associated with the USM allocation is retrieved and used for host access.
- The runtime waits for the kernels to run.
Since the same
buffer
object was used for all three kernels, the ComputeCpp runtime manages the data dependencies of the kernels. Consequently, the expected value3
is stored atusmPtr[0]
after all events are complete. - The
usmPtr
isfree
d, and the underlyingbuffer
object is destroyed with it.
Prototype limitations
Host-accessible allocation
The prototype implementation uses the existing buffer implementation,
passing in the USM pointer in place of a host pointer.
This means that the USM allocation must be accessible on the host device.
Consequently, it must be allocated using malloc_shared
or malloc_host
.
The buffer creates device copies
In the prototype implementation,
additional copies of the data are created on the device where the USM pointer is accessed.
This requires the data to be copied back to the USM allocation.
For normal sycl::buffer
s,
the copy back to the USM/host allocation would occur on destruction of the buffer.
However, because the context retains the buffer internally,
this does not occur. This has two consequences:
The USM allocation must be explicitly updated
The USM allocation must be explicitly updated.
This can be done using update_host
as follows:
{
auto buf = ext::codeplay::get_buffer(usmPtr, exampleContext);
exampleQueue.update_host(accessor(buf));
}
It is useful to put this within its own scope to ensure the destruction of the local buf
object.
All inter-op buffers must be destroyed before USM pointer is freed
The following code contains a bug:
float* usmPtr = malloc_shared<float>(count, exampleQueue);
auto buf = ext::codeplay::get_buffer(usmPtr, exampleContext);
// Use the buffer in a queue exampleQueue
exampleQueue.update_host(accessor(buf));
free(usmPtr, exampleContext);
Here, a buffer buf
is created based on usmPtr
.
The usmPtr
is free
d before buf
is destroyed.
When buf
is destroyed, it attempts to copy back data to the original USM allocation which is now invalid.