Release Notes
The ComputeCpp™ Community Edition 1.1.3. targets the SYCL™ 1.2.1 Khronos specification. This release provides a conformant edition for OpenCL™ 1.2 host and devices with SPIR™ 1.2 support. ComputeCpp supports AMD® and Intel® OpenCL 1.2 devices with compatible drivers. It also provides the capability of using the system for targeting the SYCL host device and experimental support for NVIDIA® OpenCL 1.2 devices with compatible drivers.
The ComputeCpp SDK complements the ComputeCpp package with build system integration, sample code and documentation. The ComputeCpp SDK is available at computecpp-sdk @ GitHub. A Getting Started document can be found in Getting Started Guide (local copy at doc/computecpp_getting_started.pdf).
Further documentation of the system can also be found in the Codeplay developer website (local copy at doc/api_pages/index.html).
This system implements in full the SYCL 1.2.1 standard, however it has not been finely tuned yet in terms of performance.
Please, refer to the Platform Support Notes for details on the supported platforms and limitations. Alongside with this document there is a local copy of the supported platforms at doc/computecpp_platform_support_notes.pdf.
Changelog
Community Edition 1.1.6
Improvement
- Enqueue index is now double flipped (see blog post for details)
- SYCL alias to cl::sycl is enabled by default on modern compilers
Bug Fixes
- Transaction order is now defended against accidental reordering preventing unexpected scheduler pending list changes.
Community Edition 1.1.5
New features
- Profiling of host OpenCL calls via Professional Edition
- Output iterators can now be used for
set_final_data
Improvement
- Faster profiling (via PE)
- Improvements to asynchronous execution
- Faster
fetch_min
andfetch_max
for host atomics - Various compiler and runtime improvements
- Removed
volatile
qualifiers from atomic
Bug Fixes
- Fixed an address space bug with
std::swap
inside kernels - Various fixes for SYCL event
- Fixed ABI issues with
property_list
in Visual Studio - Fixed issue with nullary kernels not working with the OpenCL interop
Community Edition 1.1.4
Improvement
- Fixed a potential for a resource deadlock on event wait
Bug Fixes
- Queue constructor and worker thread join mechanisms improved to solve issues with global objects being destroyed after main completes.
- Various minor device compiler and runtime fixes.
Community Edition 1.1.3
New features
- Performance counter data extraction for intel GPU Gen 7.5 and up via Professional Edition (contact support.)
Improvement
- Improvements to rectangular accessor-to-accessor copies.
- The target selector now uses platform vendor when the platform name check fails.
computecpp_info
now reportsCL_DEVICE_IL_VERSION
in verbose mode.- Host device now reports local memory type as global.
Bug Fixes
- The size of memory available on the host is not zero anymore.
- Memory checking on submit is now disabled by default.
get_info
strings no longer have an extra null terminator.- Various minor device compiler and runtime fixes.
Community Edition 1.1.2
New features
- Memory checks now implemented for all memory types.
Improvement
- Store allocated memory for different memory types.
- Insufficient memory errors now throw synchronously.
Bug Fixes
- Various minor device compiler and runtime fixes.
Community Edition 1.1.1
New features
- Experimental support for consuming SPIR-V on OpenCL 2.1 devices
Improvement
- Support for
const
type inmulti_ptr
- Extension to track allocated global memory on the SYCL device
- Exposed via
cl::sycl::codeplay::get_allocated_memory_size
Bug Fixes
- Various minor device compiler and runtime fixes.
Community Edition 1.1.0
New features
- MinGW package available on request (contact support)
Improvement
- Reduced overhead of accessors
- Queue no longer waits on instruction anymore
- Multiple placeholder accessor can now be used in the same command group
- Relaxed restrictions on overlapping accessors to improve performance
Bug Fixes
- Fixed issue with sub-buffers on host (reported by users)
- Abacus symbols now present for the Windows debug library
- Build options return empty string when no custom flags are provided
- Various minor device compiler and runtime fixes.
Community Edition 1.0.5
New features
- [DEPRECATION] require() function that takes a buffer and a placeholder accessor will no longer be supported in future releases.
- Added a Codeplay extension handler that features an interop_task() function. This can be used to integrate OpenCL libraries with SYCL code where the SYCL runtime will take care of data availability.
Improvement
- A default constructed placeholder accessor is constructed in a null state.
- Default constructed placeholder accessors can be passed to kernels without being bound to a buffer.
- make_placeholder_accessor() extension member function can be used to create a placeholder accessor from a buffer.
- Placeholder accessor acquired the is_null() function which can be used to query whether a buffer has been bound to it.
Bug Fixes
- Various minor device compiler and runtime fixes.
Community Edition 1.0.4
New features
- Added accelerator_selector.
Improvement
- Default constructed events no longer throw on wait_and_throw.
Bug Fixes
- Rectangular copy now uses correct element sizes.
- Fixes to the reinterpret buffer dependency tracking.
- Various minor device compiler and runtime fixes.
Community Edition 1.0.3
New features
- No-host-access extension added, See CP017 in https://github.com/codeplaysoftware/standards-proposals/blob/master/host_access/index.md
- Builtin-kernel extension added, See CP018 https://github.com/codeplaysoftware/standards-proposals/blob/master/builtin_kernels/sycl-1.2.1/builtin_kernels.md
- On-chip memory extension added, See CP019 in https://github.com/AerialMantis/standards-proposals/blob/CP019/onchip-memory/index.md
- Host-device now support rectangular copies
Improvement
- Invalid host access will now throw for a subset of functions
- Interface between compiler and runtime is now more robust
- Use of OpenCL flags is now based on host_access property
Bug Fixes
- Support for clCreatePRogramWithILKHR for ARM Mali
- Various minor device compiler and runtime fixes.
- Fixed: Event wait_and_throw still forward exceptions to async_handler
- Fixed: Rectangular copies on multi-dimensional accessors
Community Edition 1.0.2
New features
- ComputeCpp runtime makes better usage of clCreateBuffer options and reduces copies whenever possible favouring map/unmap and ALLOC_HOST_PTR | COPY_PTR options.
- Added verbose error build logs
Improvements
- Improved Device selector order, now by device type (accelerator > gpu > cpu > host)
- Allocators are now non-type matching SYCL 1.2.1 rev 3
- computecpp_info tool now displays the available builtin kernels.
- Better runtime detection when the host-device copies can be omitted
Bug Fixes
- Fixed issue with incorrectly reported empty command group error when thrown.
- Various minor device compiler and runtime fixes.
Community Edition 1.0.1
Improvements
- Exception types are now properly reported.
- Improved dependency tracking for reinterpreted buffers. It is no longer necessary to force host access to synchronize reinterpreted buffers.
Bug Fixes
- Fixed issue with reinterpreted buffers overwriting original data
- Various minor device compiler and runtime fixes.
Community Edition 1.0.0
New features
- [SYCL 1.2.1] Runtime now conforms to the SYCL 1.2.1 revision 3 spec
Improvements
- Warnings added for locally defined types
- The runtime does not over-allocate data anymore
- Fixes to tracking data dependencies across reinterpreted buffers
Bug Fixes
- Various minor device compiler and runtime fixes
Community Edition 0.9.1
New features
- [SYCL 1.2.1] Runtime now follows SYCL 1.2.1 revision 3 spec, please note that you will need to change any instances of the following:
- size_t group::get_linear() replaced with:
- size_t group::get_linear_id()
- range
nd_range::get_local()replaced with: - size_t range
nd_range::get_local_range()
- size_t range
- nd_range::get_local(int dimension) replaced with:
- size_t nd_range::get_local_range(int dimension)
- range
nd_range::get_group() replaced with: - range
nd_range::get_group_range()
- range
- size_t nd_range::get_group(int dimension) replaced with:
- size_t nd_range::get_group_range(int dimension)
- range
nd_range::get_global() replaced with: - range
nd_range::get_global_range()
- range
- size_t nd_range::get_global(int dimension) replaced with:
- size_t nd_range::get_global_range(int dimension)
- id
nd_item::get_num_groups() replaced with: - range
nd_item::get_group_range()
- range
- size_t nd_item::get_num_groups(int dimension) replaced with:
- size_t nd_item::get_group_range(int dimension)
- id
nd_item::get_local() replaced with: - size_t nd_item::get_local_id(int dimension)
- size_t nd_item::get_local(int dimension) replaced with:
- size_t nd_item::get_local_id(int dimension)
- id
nd_item::get_global() replaced with: - id
nd_item::get_global_id()
- id
- size_t nd_item::get_global(int dimension)replaced with:
- size_t nd_item::get_global_id(int dimension)
Bug Fixes
- Various minor device compiler and runtime fixes
Community Edition 0.9.0
New features
- Candidate version for SYCL 1.2.1 conformance, all features and API implemented
Improvements
- Clang version bumped to 6.0
- Improvements to scheduler thread locking strategy increasing performance in low-power CPUs.
- Improvements to compiler driver to enable cross-compilation support
Bug Fixes
- Various minor device compiler fixes
- Fixed: Event profile information for kernel can now be queried inmediatly after a command group submission
- Fixed: Accessor range incorrect when using offsets
Community Edition 0.8.0
New features
- [SYCL 1.2.1] Vector classes fully conformant to the 1.2.1 specification
- [SYCL 1.2.1] Major updates to built-in functions towards conformance 1.2.1 specification
- [SYCL 1.2.1] Stream class changes to match SYCL 1.2.1 specification
- [SYCL 1.2.1] Interop constructor for buffers now match SYCL 1.2.1 specification
- [SYCL 1.2.1] Fully implemented 1.2.1 specification requirements for by-value and by-reference semantics
- [SYCL 1.2.1] Implemented buffer::set_write_back according to SYCL 1.2.1 specification
- [SYCL 1.2.1] Implemented buffer::h_item according to SYCL 1.2.1 specification
Improvements
- Numerous minor runtime fixes
Bug Fixes
- Various minor device compiler fixes
Community Edition 0.7.0
New features
- [SYCL 1.2.1] Stream class changes to match 1.2.1 specification
- [SYCL 1.2.1] is_sub_buffer method added to API
- [SYCL 1.2.1] has_property and get_property support added for public queue interface
- [SYCL 1.2.1] Missing pointer class functions added
- [SYCL 1.2.1] event get_profiling_info implemented
- [SYCL 1.2.1] Swizzled vectors can now be swizzled, this includes:
- Simple swizzles (xyzw(), rgba())
- Index access swizzles from s0() to sF()
- The swizzle template function
- e.g. cl::sycl::float4.xyzw().xxx().template swizzle<:sycl::elem::s0 cl::sycl::elem::s1="">().y();,>
- [SYCL 1.2.1] Vector hi lo odd and even functions are now available for all vector sizes
Improvements
- Switched to a new scheduling policy with less overhead
- Optimised swizzle transformation functions for improved compile times
Bug Fixes
- Race condition in program and kernel cache fixed
- Various minor device compiler fixes
Community Edition 0.6.1
New features
- [SYCL 1.2.1 REMOVALS] Get_access host_buffer and target parameters removals
- [SYCL 1.2.1 DEPRECATIONS] Buffer accessor interface - order of offest and range deprecations.
- [SYCL 1.2.1] Atomic class changes to match 1.2.1 specification
- [SYCL 1.2.1] Exception types implemented for logging
- [SYCL 1.2.1] Introduced support for image array accessors
- [SYCL 1.2.1] Missing pointer class functions added
- [SYCL 1.2.1] Not_Null implemented
- [SYCL 1.2.1] Vector Hi, Lo, Odd, and Even methods for vec
,3>
Improvements
- Minor improvements to scheduler
Bug Fixes
- Exception class now uses size_type
- Various minor device compiler fixes
Community Edition 0.6.0
New features
- [SYCL 1.2.1] Implemented buffer reinterpret: creates a new view of the same buffer with a different underlying type
- [SYCL 1.2.1] Program, kernel, context get_info method now return SYCL objects as per specification
- [SYCL 1.2.1] Updated context and kernel interface to match 1.2.1
- [SYCL 1.2.1] Deprecated Kernel interop constructor in favour of new syntax
- [SYCL-1.2.1] Template argument of various classes is now defaulted to 1
Improvements
- Vectors can now reliably take swizzles on the right hand side of an expression and is not restricted to cl_* alias types
- Added remaining
cl_half
builtins
Bug Fixes
- Various minor device compiler fixes, including 64/32 bit
Community Edition 0.5.1
New feature
- [SYCL 1.2.1] Added functions to group and item classes as per specification
Improvements
- Error message clarification for host out of memory
Bug Fixes
- Various minor device compiler fixes
Community Edition 0.5.0
New feature
- Bumped SYCL interface to SYCL 1.2.1, see the relevant announcement in http://sycl.tech
- Note the interface of most of the classes is now matching the SYCL 1.2.1 specification.
- See the ComputeCpp Implementation Notes for the details on the mismatch between spec and implementation.
- [SYCL 1.2.1] All the copy functions are now SYCL core functionality hence move from the codeplay namespace to cl::sycl
- [SYCL 1.2.1] Buffer/Image properties for use_host_pointer, context_bound and use_mutex
- Note this replaces the map_allocator from SYCL 1.2
- [SYCL 1.2.1] Complete device and host SYCL vector interface (cl::sycl::vec
) ,> - [SYCL 1.2.1] Implemented simple vector swizzles (e.g vec.abc())
- [compute++] New -sycl-target option to specify the target of the binary
- [compute++] Added -fsycl-ih-last option which forces inclusion of the integration header at the end of the translation unit.
Improvements
- Performance improvements for applications using large number of command groups
- Performance improvements when dispatching multiple short-lived kernels to a device
Bug Fixes
- Fixed sporadic deadlock that appeared in TensorFlow test suite
- Fixed multiple bugs on the vector interface
- Various minor device compiler fixes
Community Edition 0.4.0
New features
- New ComputeCpp unified driver to simplify build process (using a single compiler call)
- Added experimental support for PTX generation
Bug fixes
- Accelerator devices are now reported also during device selection
- Various minor device compiler fixes
Community Edition 0.3.3
New features
- Added ARM support for OpenCL SPIR or SPIR-V enabled devices
- The SYCL explicit copy operations on command groups now support fall-back queue
Improvements
- Improved error reporting when kernels fails to execute
- Reduced execution time when multiple command groups are enqueued to the same OpenCL queue
- Reduced overall latency of SYCL command group submission
Bug fixes
- Various bug fixes for command groups with fall-back execution
Community Edition 0.3.2
New features
- Added ranged get_access method to buffer
- Added a method retrieve the get_allocator from the buffer
Improvements
- Reduced the number of OpenCL build program calls
- Improved error message for address-space resolution
- Improved error for placeholder accessor in parameters
- Improved ranged copy functions to and from device
- Improved error reporting when building kernels
Bug fixes
- Various Windows bug fixes
- Added missing const to vstore method
- Size method of range is now marked as const
- Get device method of the queue is now const
- Several bug fixes in the compiler
Community Edition 0.3.1
New features
- N/A
Improvements
- Several improvements on placeholder accessor support
Bug fixes
- Fixed: Copy functions now work with ranged accessors
- Fixed: Copy constructor of placeholder accessor
Community Edition 0.3.0
New features
- compute++ now based on clang/llvm 3.9
- Added support for vload/vstore functions (See CP007 https://github.com/codeplaysoftware/standards-proposals )
- Updated syntax of update_to functions to match latest CP001 status https://github.com/codeplaysoftware/standards-proposals
- ComputeCpp Package supports Visual Studio 2015 and above
- [Experimental] Added SPIR-V generation support
- [Experimental] Added Default placeholder constructor (See https://github.com/codeplaysoftware/standards-proposals/pull/18)
Improvements
- Removed dependencies on C headers
Bug fixes
- Some combinations of OpenCL builtins were not available (select, min/max, vload, ...)
cl::sycl::kernel::get_work_group_info<param>(cl_device_id device)
now returns the correct value for the platform
Community Edition 0.2.1
New features
- Implemented a Placeholder Accessor type extension (See CP004 in https://github.com/codeplaysoftware/standards-proposals)
Improvements
- Converted the stream class delimiters to static const char arrays following SYCL 1.2 specification
- Update_to/From device now support ranged accessors
- Host Accessors use map operations on devices where CL_DEVICE_HOST_UNIFIED_MEMORY is cl_true
Bug fixes
- Removed exception throwing in public header file
- Various compiler fixes
Community Edition 0.2.0
New features
- Added support for 16.04 Ubuntu
Improvements
- Various compiler fixes
Bug fixes
- Several host fixes for the program class
- Minor Buffer/Subbuffer error code corrections
Community Edition 0.1.4
New features
- NA
Improvements
computecpp_info
now prints a link to platform support notes: https://computecpp.codeplay.com/releases/latest/platform-support-notes.- Added
CL_DEVICE_VERSION
tocomputecpp_info
verbose output. - Performance optimizations relating to long running programs with many buffers.
- Added attributes as presented in section 5.7 of the specification are now handled by the compiler
- Various compiler fixes
Bug fixes
- Fixed target parsing for
computecpp_info
so that now all devices in the system are properly displayed. computecpp_info
no longer reports Intel CPU devices without SSE4.1 support as "Tested".
Community Edition 0.1.3
New features
- Added Codeplay host task API extension (See Standards proposals https://github.com/codeplaysoftware/standards-proposals/blob/master/asynchronous-data-flow/index.md[CE01])
- Added Codeplay command group API extensions
update_from_device
andupdate_to_device
(See Standards proposals https://github.com/codeplaysoftware/standards-proposals/blob/master/asynchronous-data-flow/index.md[CE01]). Versions of these methods that take an iterator have not yet been implemented. - As requested by various users, the SPIR is not cyphered anymore. This enables writing extra compiler passes that work on top of the SPIR produced by ComputeCpp.
Improvements
- Improved behaviour when SIGINT is triggered to avoid crashing underlying OpenCL implementation whenever possible
- Variadic Lenght Arrays (VLA) are now reported as errors in SYCL device kernels
computecpp_info
now returns error code when error occurs
Bug fixes
- Default constructor for SYCL event created incorrect OpenCL object.
- Several device compiler bug-fixes
Community Edition 0.1.2
New features
- Basic extensions support enabled
Improvements
- Improved device compiler speed
- The device compiler will now warn if an undefined function is used inside a SYCL kernel
- Improved handling of SIGINT
- Device support status now reported as tested/untested/unsupported in computecpp_info
Bug fixes
- Fixed issues related to SPIR module conformance
- Various compiler fixes
- Fixed 2D atomics
- Fixed buffer memory leak
- Resolved an OpenCL related memory leak
Breaking changes
- Added the
info::gl_context_interop flag
to the context constructors - Removed the
--all
flag from computecpp_info --sycl-no-diags
is now deprecated and will be removed in the next release
Community Edition 0.1.1
- Various compiler bug fixes after feedback from users
- The macro
__SYCL_DEVICE_ONLY__
is now defined by the device compiler - Disallowed call to variadic functions inside SYCL kernels
- Relying on the OpenCL-platform default local workgroup size when user does not provide a local workgroup side on the parallel for call.
Community Edition 0.1.0
- Initial Release.
ComputeCpp package
The ComputeCpp package includes the components of the ComputeCpp SYCL implementation. ComputeCpp uses the single source multi-pass techinique, where multiple compilers are providing maximum flexibility in terms of creating host and device binaries from a single SYCL source file. The main components of ComputeCpp are:
- the SYCL compiler: compute++,
- the SYCL C++ header:
CL/sycl.hpp
, - the C++ shared library:
libComputeCpp.so
, - the C++ library headers included by :
CL/sycl.hpp
.
Note: The version of each the components matches the version number of the ComputeCpp package. No component can be used in isolation, independently or without matching the versioning of each of them.
The ComputeCpp package provides the computecpp_info
tool for checking the configuration of the target platform.
Compiler
The compute++ compiler takes as input SYCL C++ source code and outputs the ComputeCpp integration header.
compute++ is a clang-based C++ compiler that supports C++11 and partially C++14. The version of compute++ matches the version of the ComputeCpp package.
Documentation of the SYCL compiler, compute++, can be found documentation (local copy can be found at doc/computepp_man.pdf).
Library and headers
The C++ headers and shared library are a work-in-progress implementation of the SYCL runtime, kernel library and SYCL host device implementation. The headers have a single entry point the CL\sycl.hpp
header. The shared library that any application needs to link against is lib\libComputeCpp.so
.
The macro _COMPUTECPP_
is defined to have the version of the ComputeCpp shared library and headers, which is matching the ComputeCpp package versioning.
The documentation of the ComputeCpp API can be found on the Codeplay developer website (local copy at doc/api_pages/index.html).
computecpp_info tool
The computecpp_info
tool makes it possible to check the target platform configuration. It provides information about the OpenCL devices and drivers, and whether they are supported by the ComputeCpp package.
The tool can be found in bin\computecpp_info
.
For further documentation please look at link: Platform Support Notes (local copy at doc/computecpp_info_man.pdf).
ComputeCpp implementation notes
All SYCL 1.2.1 features are implemented, although they may not have an optimal implementation for all supported platforms.
Unsupported features
The current version does not support any of the non-core features of the SYCL 1.2 specification.
Unsupported SYCL features:
- Enabling OpenCL extensions
- Writting to 3D image memory objects
- Depth and depth-stencil images
- No support of
SYCL_EXTERNAL
. All command-group kernels need to be defined in headers in order to be processed in one translation unit.
ComputeCpp extensions
ComputeCpp supports various extensions through the _codeplay::handler_ class
. See https://github.com/codeplaysoftware/standards-proposals for details on the different extensions available.
AMD is a registered trademark of Advanced Micro Devices, Inc. Intel is a trademark of Intel Corporation or its subsidiaries in the U.S. and/or other countries. NVIDIA and CUDA are registered trademarks of NVIDIA Corporation