List of SYCL extensions implemented in ComputeCpp. Some of these extensions have been proposed for future versions of SYCL and some of them have already made it in (though possibly slightly modified).
Some extensions have feature test macros that can be used to check whether the extension is available or not. If there are multiple iterations of an extension, the feature test macro will have a different value for each iteration.
constexpr id and range classes
- Feature test macro: N/A
- First implemented: Version 2.5.0
- SYCL version: 1.2.1 and higher
- Status: Proposed for SYCL post-2020
When compiling SYCL code (1.2.1 or higher) using C++14 or higher,
all operations on id and range class objects are automatically constexpr.
Mark pointers of read-only accessors as const
- Feature test macro:
COMPUTECPP_EXT_READ_ACC_CONST_PTR - Values:
202109 - First implemented: Version 2.7.0
- SYCL version: 1.2.1 and higher
- Status: Experimental
When the macro COMPUTECPP_EXT_READ_ACC_CONST_PTR_ENABLE
is defined for the device compiler,
the internal pointers of all read-only accessors become const.
This can potentially lead to more efficient code generation.
This isn't enabled by default because the constness gets propagated and some existing code might break.
Make SYCL 2020's host_task available in 1.2.1
- Feature test macro:
COMPUTECPP_EXT_2020_HOST_TASK - Values:
202111 - First implemented: Version 2.8.0
- SYCL version: 1.2.1 and higher
- Status: Experimental
When the macro COMPUTECPP_EXT_2020_HOST_TASK_ENABLE is defined,
the SYCL 2020 host_task implementation will also
be available in SYCL 1.2.1 mode.
This may provide some improvements over Codeplay's host_task
extension originally provided for SYCL 1.2.1, namely less overhead and compatibility
with the context_bound property for buffers.
USM-Buffer interop
- Feature test macro:
SYCL_EXT_CODEPLAY_USM_BUFFER_INTEROP - Values:
202207 - First implemented: Version 2.11.0
- SYCL version: 2020 and higher
- Status: Experimental
A buffer associated with a Unified Shared Memory (USM) pointer can be obtained using
template <typename T>
sycl::buffer<T> sycl::ext::codeplay::get_buffer(T* usmPointer, sycl::context& ctx)
where usmPointer is a pointer to a valid USM allocation allocated in the ctx context.
The returned buffer has the same access limitations as usmPointer.
Additional limitations of the prototype are given in the Prototype limitations section.
The resulting buffer is then associated with usmPointer,
persisting for the lifetime of the USM allocation.
Subsequent calls to get_buffer(...) using the same usmPointer argument will return the same buffer object.
The buffer object is freed when sycl::free(usmPointer, ctx) is called.
For the prototype implementation,
the usmPointer must point to the beginning of the USM allocation,
and be host-accessible (allocated with malloc_host or malloc_shared).
Capabilities
The capabilities of the extension are demonstrated in the following example:
// USM memory is allocated.
float* usmPtr = malloc_shared<float>(count, exampleQueue);
{
auto buf = ext::codeplay::get_buffer(usmPtr, exampleContext);
exampleQueue.submit([&](handler& cgh) {
auto acc = buf.get_access(cgh);
cgh.single_task<example_kernel<1>>([=]() {
acc[0] = 1;
});
});
}
{
// This buffer retrieves the same buffer object as
// for the previous kernel.
auto buf = ext::codeplay::get_buffer(usmPtr, exampleContext);
exampleQueue.submit([&](handler& cgh) {
auto acc = buf.get_access(cgh);
cgh.single_task<example_kernel<2>>([=]() {
acc[0] = 2;
});
});
}
{
// Host accessor.
auto buf = ext::codeplay::get_buffer(usmPtr, exampleContext);
auto accHost = buf.get_host_access();
accHost[0] = 3
}
{
// Update_host is a work-around to ensure copy back to USM memory in
// prototype.
auto buf = ext::codeplay::get_buffer(usmPtr, exampleContext);
exampleQueue.update_host(accessor(buf));
}
exampleQueue.wait_and_throw();
// usmPtr[0] == 3
free(usmPtr, exampleContext);
Here, the following occurs:
- A USM allocation is created using
malloc_sharedand pointed at byusmPtr. - A buffer associated with the USM allocation is created with
get_bufferand used inexample_kernel<1>. - The buffer associated with the USM allocation is retrieved using
get_bufferand used inexample_kernel<2>. - The buffer associated with the USM allocation is retrieved and used for host access.
- The runtime waits for the kernels to run.
Since the same
bufferobject was used for all three kernels, the ComputeCpp runtime manages the data dependencies of the kernels. Consequently, the expected value3is stored atusmPtr[0]after all events are complete. - The
usmPtrisfreed, and the underlyingbufferobject is destroyed with it.
Prototype limitations
Host-accessible allocation
The prototype implementation uses the existing buffer implementation,
passing in the USM pointer in place of a host pointer.
This means that the USM allocation must be accessible on the host device.
Consequently, it must be allocated using malloc_shared or malloc_host.
The buffer creates device copies
In the prototype implementation,
additional copies of the data are created on the device where the USM pointer is accessed.
This requires the data to be copied back to the USM allocation.
For normal sycl::buffers,
the copy back to the USM/host allocation would occur on destruction of the buffer.
However, because the context retains the buffer internally,
this does not occur. This has two consequences:
The USM allocation must be explicitly updated
The USM allocation must be explicitly updated.
This can be done using update_host as follows:
{
auto buf = ext::codeplay::get_buffer(usmPtr, exampleContext);
exampleQueue.update_host(accessor(buf));
}
It is useful to put this within its own scope to ensure the destruction of the local buf object.
All inter-op buffers must be destroyed before USM pointer is freed
The following code contains a bug:
float* usmPtr = malloc_shared<float>(count, exampleQueue);
auto buf = ext::codeplay::get_buffer(usmPtr, exampleContext);
// Use the buffer in a queue exampleQueue
exampleQueue.update_host(accessor(buf));
free(usmPtr, exampleContext);
Here, a buffer buf is created based on usmPtr.
The usmPtr is freed before buf is destroyed.
When buf is destroyed, it attempts to copy back data to the original USM allocation which is now invalid.