Release Notes

Release Notes

The ComputeCpp™ Community Edition 1.1.3. targets the SYCL™ 1.2.1 Khronos specification. This release provides a conformant edition for OpenCL™ 1.2 host and devices with SPIR™ 1.2 support. ComputeCpp supports AMD® and Intel® OpenCL 1.2 devices with compatible drivers. It also provides the capability of using the system for targeting the SYCL host device and experimental support for NVIDIA® OpenCL 1.2 devices with compatible drivers.

The ComputeCpp SDK complements the ComputeCpp package with build system integration, sample code and documentation. The ComputeCpp SDK is available at computecpp-sdk @ GitHub. A Getting Started document can be found in Getting Started Guide (local copy at doc/computecpp_getting_started.pdf).

Further documentation of the system can also be found in the Codeplay developer website (local copy at doc/api_pages/index.html).

This system implements in full the SYCL 1.2.1 standard, however it has not been finely tuned yet in terms of performance.

Please, refer to the Platform Support Notes for details on the supported platforms and limitations. Alongside with this document there is a local copy of the supported platforms at doc/computecpp_platform_support_notes.pdf.

Changelog

Community Edition 1.1.6

Improvement

  • Enqueue index is now double flipped (see blog post for details)
  • SYCL alias to cl::sycl is enabled by default on modern compilers

Bug Fixes

  • Transaction order is now defended against accidental reordering preventing unexpected scheduler pending list changes.

Community Edition 1.1.5

New features

  • Profiling of host OpenCL calls via Professional Edition
  • Output iterators can now be used for set_final_data

Improvement

  • Faster profiling (via PE)
  • Improvements to asynchronous execution
  • Faster fetch_min and fetch_max for host atomics
  • Various compiler and runtime improvements
  • Removed volatile qualifiers from atomic

Bug Fixes

  • Fixed an address space bug with std::swap inside kernels
  • Various fixes for SYCL event
  • Fixed ABI issues with property_list in Visual Studio
  • Fixed issue with nullary kernels not working with the OpenCL interop

Community Edition 1.1.4

Improvement

  • Fixed a potential for a resource deadlock on event wait

Bug Fixes

  • Queue constructor and worker thread join mechanisms improved to solve issues with global objects being destroyed after main completes.
  • Various minor device compiler and runtime fixes.

Community Edition 1.1.3

New features

  • Performance counter data extraction for intel GPU Gen 7.5 and up via Professional Edition (contact support.)

Improvement

  • Improvements to rectangular accessor-to-accessor copies.
  • The target selector now uses platform vendor when the platform name check fails.
  • computecpp_info now reports CL_DEVICE_IL_VERSION in verbose mode.
  • Host device now reports local memory type as global.

Bug Fixes

  • The size of memory available on the host is not zero anymore.
  • Memory checking on submit is now disabled by default.
  • get_info strings no longer have an extra null terminator.
  • Various minor device compiler and runtime fixes.

Community Edition 1.1.2

New features

  • Memory checks now implemented for all memory types.

Improvement

  • Store allocated memory for different memory types.
  • Insufficient memory errors now throw synchronously.

Bug Fixes

  • Various minor device compiler and runtime fixes.

Community Edition 1.1.1

New features

  • Experimental support for consuming SPIR-V on OpenCL 2.1 devices

Improvement

  • Support for const type in multi_ptr
  • Extension to track allocated global memory on the SYCL device
  • Exposed via cl::sycl::codeplay::get_allocated_memory_size

Bug Fixes

  • Various minor device compiler and runtime fixes.

Community Edition 1.1.0

New features

  • MinGW package available on request (contact support)

Improvement

  • Reduced overhead of accessors
  • Queue no longer waits on instruction anymore
  • Multiple placeholder accessor can now be used in the same command group
  • Relaxed restrictions on overlapping accessors to improve performance

Bug Fixes

  • Fixed issue with sub-buffers on host (reported by users)
  • Abacus symbols now present for the Windows debug library
  • Build options return empty string when no custom flags are provided
  • Various minor device compiler and runtime fixes.

Community Edition 1.0.5

New features

  • [DEPRECATION] require() function that takes a buffer and a placeholder accessor will no longer be supported in future releases.
  • Added a Codeplay extension handler that features an interop_task() function. This can be used to integrate OpenCL libraries with SYCL code where the SYCL runtime will take care of data availability.

Improvement

  • A default constructed placeholder accessor is constructed in a null state.
  • Default constructed placeholder accessors can be passed to kernels without being bound to a buffer.
  • make_placeholder_accessor() extension member function can be used to create a placeholder accessor from a buffer.
  • Placeholder accessor acquired the is_null() function which can be used to query whether a buffer has been bound to it.

Bug Fixes

  • Various minor device compiler and runtime fixes.

Community Edition 1.0.4

New features

  • Added accelerator_selector.

Improvement

  • Default constructed events no longer throw on wait_and_throw.

Bug Fixes

  • Rectangular copy now uses correct element sizes.
  • Fixes to the reinterpret buffer dependency tracking.
  • Various minor device compiler and runtime fixes.

Community Edition 1.0.3

New features

  • No-host-access extension added, See CP017 in https://github.com/codeplaysoftware/standards-proposals/blob/master/host_access/index.md
  • Builtin-kernel extension added, See CP018 https://github.com/codeplaysoftware/standards-proposals/blob/master/builtin_kernels/sycl-1.2.1/builtin_kernels.md
  • On-chip memory extension added, See CP019 in https://github.com/AerialMantis/standards-proposals/blob/CP019/onchip-memory/index.md
  • Host-device now support rectangular copies

Improvement

  • Invalid host access will now throw for a subset of functions
  • Interface between compiler and runtime is now more robust
  • Use of OpenCL flags is now based on host_access property

Bug Fixes

  • Support for clCreatePRogramWithILKHR for ARM Mali
  • Various minor device compiler and runtime fixes.
  • Fixed: Event wait_and_throw still forward exceptions to async_handler
  • Fixed: Rectangular copies on multi-dimensional accessors

Community Edition 1.0.2

New features

  • ComputeCpp runtime makes better usage of clCreateBuffer options and reduces copies whenever possible favouring map/unmap and ALLOC_HOST_PTR | COPY_PTR options.
  • Added verbose error build logs

Improvements

  • Improved Device selector order, now by device type (accelerator > gpu > cpu > host)
  • Allocators are now non-type matching SYCL 1.2.1 rev 3
  • computecpp_info tool now displays the available builtin kernels.
  • Better runtime detection when the host-device copies can be omitted

Bug Fixes

  • Fixed issue with incorrectly reported empty command group error when thrown.
  • Various minor device compiler and runtime fixes.

Community Edition 1.0.1

Improvements

  • Exception types are now properly reported.
  • Improved dependency tracking for reinterpreted buffers. It is no longer necessary to force host access to synchronize reinterpreted buffers.

Bug Fixes

  • Fixed issue with reinterpreted buffers overwriting original data
  • Various minor device compiler and runtime fixes.

Community Edition 1.0.0

New features

  • [SYCL 1.2.1] Runtime now conforms to the SYCL 1.2.1 revision 3 spec

Improvements

  • Warnings added for locally defined types
  • The runtime does not over-allocate data anymore
  • Fixes to tracking data dependencies across reinterpreted buffers

Bug Fixes

  • Various minor device compiler and runtime fixes

Community Edition 0.9.1

New features

  • [SYCL 1.2.1] Runtime now follows SYCL 1.2.1 revision 3 spec, please note that you will need to change any instances of the following:
  • size_t group::get_linear() replaced with:
    • size_t group::get_linear_id()
  • range nd_range::get_local()replaced with:
    • size_t range nd_range::get_local_range()
  • nd_range::get_local(int dimension) replaced with:
    • size_t nd_range::get_local_range(int dimension)
  • range nd_range::get_group() replaced with:
    • range nd_range::get_group_range()
  • size_t nd_range::get_group(int dimension) replaced with:
    • size_t nd_range::get_group_range(int dimension)
  • range nd_range::get_global() replaced with:
    • range nd_range::get_global_range()
  • size_t nd_range::get_global(int dimension) replaced with:
    • size_t nd_range::get_global_range(int dimension)
  • id nd_item::get_num_groups() replaced with:
    • range nd_item::get_group_range()
  • size_t nd_item::get_num_groups(int dimension) replaced with:
    • size_t nd_item::get_group_range(int dimension)
  • id nd_item::get_local() replaced with:
    • size_t nd_item::get_local_id(int dimension)
  • size_t nd_item::get_local(int dimension) replaced with:
    • size_t nd_item::get_local_id(int dimension)
  • id nd_item::get_global() replaced with:
    • id nd_item::get_global_id()
  • size_t nd_item::get_global(int dimension)replaced with:
    • size_t nd_item::get_global_id(int dimension)

Bug Fixes

  • Various minor device compiler and runtime fixes

Community Edition 0.9.0

New features

  • Candidate version for SYCL 1.2.1 conformance, all features and API implemented

Improvements

  • Clang version bumped to 6.0
  • Improvements to scheduler thread locking strategy increasing performance in low-power CPUs.
  • Improvements to compiler driver to enable cross-compilation support

Bug Fixes

  • Various minor device compiler fixes
  • Fixed: Event profile information for kernel can now be queried inmediatly after a command group submission
  • Fixed: Accessor range incorrect when using offsets

Community Edition 0.8.0

New features

  • [SYCL 1.2.1] Vector classes fully conformant to the 1.2.1 specification
  • [SYCL 1.2.1] Major updates to built-in functions towards conformance 1.2.1 specification
  • [SYCL 1.2.1] Stream class changes to match SYCL 1.2.1 specification
  • [SYCL 1.2.1] Interop constructor for buffers now match SYCL 1.2.1 specification
  • [SYCL 1.2.1] Fully implemented 1.2.1 specification requirements for by-value and by-reference semantics
  • [SYCL 1.2.1] Implemented buffer::set_write_back according to SYCL 1.2.1 specification
  • [SYCL 1.2.1] Implemented buffer::h_item according to SYCL 1.2.1 specification

Improvements

  • Numerous minor runtime fixes

Bug Fixes

  • Various minor device compiler fixes

Community Edition 0.7.0

New features

  • [SYCL 1.2.1] Stream class changes to match 1.2.1 specification
  • [SYCL 1.2.1] is_sub_buffer method added to API
  • [SYCL 1.2.1] has_property and get_property support added for public queue interface
  • [SYCL 1.2.1] Missing pointer class functions added
  • [SYCL 1.2.1] event get_profiling_info implemented
  • [SYCL 1.2.1] Swizzled vectors can now be swizzled, this includes:
  • Simple swizzles (xyzw(), rgba())
  • Index access swizzles from s0() to sF()
  • The swizzle template function
    • e.g. cl::sycl::float4.xyzw().xxx().template swizzle().y();
  • [SYCL 1.2.1] Vector hi lo odd and even functions are now available for all vector sizes

Improvements

  • Switched to a new scheduling policy with less overhead
  • Optimised swizzle transformation functions for improved compile times

Bug Fixes

  • Race condition in program and kernel cache fixed
  • Various minor device compiler fixes

Community Edition 0.6.1

New features

  • [SYCL 1.2.1 REMOVALS] Get_access host_buffer and target parameters removals
  • [SYCL 1.2.1 DEPRECATIONS] Buffer accessor interface - order of offest and range deprecations.
  • [SYCL 1.2.1] Atomic class changes to match 1.2.1 specification
  • [SYCL 1.2.1] Exception types implemented for logging
  • [SYCL 1.2.1] Introduced support for image array accessors
  • [SYCL 1.2.1] Missing pointer class functions added
  • [SYCL 1.2.1] Not_Null implemented
  • [SYCL 1.2.1] Vector Hi, Lo, Odd, and Even methods for vec

Improvements

  • Minor improvements to scheduler

Bug Fixes

  • Exception class now uses size_type
  • Various minor device compiler fixes

Community Edition 0.6.0

New features

  • [SYCL 1.2.1] Implemented buffer reinterpret: creates a new view of the same buffer with a different underlying type
  • [SYCL 1.2.1] Program, kernel, context get_info method now return SYCL objects as per specification
  • [SYCL 1.2.1] Updated context and kernel interface to match 1.2.1
  • [SYCL 1.2.1] Deprecated Kernel interop constructor in favour of new syntax
  • [SYCL-1.2.1] Template argument of various classes is now defaulted to 1

Improvements

  • Vectors can now reliably take swizzles on the right hand side of an expression and is not restricted to cl_* alias types
  • Added remaining cl_half builtins

Bug Fixes

  • Various minor device compiler fixes, including 64/32 bit

Community Edition 0.5.1

New feature

  • [SYCL 1.2.1] Added functions to group and item classes as per specification

Improvements

  • Error message clarification for host out of memory

Bug Fixes

  • Various minor device compiler fixes

Community Edition 0.5.0

New feature

  • Bumped SYCL interface to SYCL 1.2.1, see the relevant announcement in http://sycl.tech
  • Note the interface of most of the classes is now matching the SYCL 1.2.1 specification.
  • See the ComputeCpp Implementation Notes for the details on the mismatch between spec and implementation.
  • [SYCL 1.2.1] All the copy functions are now SYCL core functionality hence move from the codeplay namespace to cl::sycl
  • [SYCL 1.2.1] Buffer/Image properties for use_host_pointer, context_bound and use_mutex
  • Note this replaces the map_allocator from SYCL 1.2
  • [SYCL 1.2.1] Complete device and host SYCL vector interface (cl::sycl::vec)
  • [SYCL 1.2.1] Implemented simple vector swizzles (e.g vec.abc())
  • [compute++] New -sycl-target option to specify the target of the binary
  • [compute++] Added -fsycl-ih-last option which forces inclusion of the integration header at the end of the translation unit.

Improvements

  • Performance improvements for applications using large number of command groups
  • Performance improvements when dispatching multiple short-lived kernels to a device

Bug Fixes

  • Fixed sporadic deadlock that appeared in TensorFlow test suite
  • Fixed multiple bugs on the vector interface
  • Various minor device compiler fixes

Community Edition 0.4.0

New features

  • New ComputeCpp unified driver to simplify build process (using a single compiler call)
  • Added experimental support for PTX generation

Bug fixes

  • Accelerator devices are now reported also during device selection
  • Various minor device compiler fixes

Community Edition 0.3.3

New features

  • Added ARM support for OpenCL SPIR or SPIR-V enabled devices
  • The SYCL explicit copy operations on command groups now support fall-back queue

Improvements

  • Improved error reporting when kernels fails to execute
  • Reduced execution time when multiple command groups are enqueued to the same OpenCL queue
  • Reduced overall latency of SYCL command group submission

Bug fixes

  • Various bug fixes for command groups with fall-back execution

Community Edition 0.3.2

New features

  • Added ranged get_access method to buffer
  • Added a method retrieve the get_allocator from the buffer

Improvements

  • Reduced the number of OpenCL build program calls
  • Improved error message for address-space resolution
  • Improved error for placeholder accessor in parameters
  • Improved ranged copy functions to and from device
  • Improved error reporting when building kernels

Bug fixes

  • Various Windows bug fixes
  • Added missing const to vstore method
  • Size method of range is now marked as const
  • Get device method of the queue is now const
  • Several bug fixes in the compiler

Community Edition 0.3.1

New features

  • N/A

Improvements

  • Several improvements on placeholder accessor support

Bug fixes

  • Fixed: Copy functions now work with ranged accessors
  • Fixed: Copy constructor of placeholder accessor

Community Edition 0.3.0

New features

  • compute++ now based on clang/llvm 3.9
  • Added support for vload/vstore functions (See CP007 https://github.com/codeplaysoftware/standards-proposals )
  • Updated syntax of update_to functions to match latest CP001 status https://github.com/codeplaysoftware/standards-proposals
  • ComputeCpp Package supports Visual Studio 2015 and above
  • [Experimental] Added SPIR-V generation support
  • [Experimental] Added Default placeholder constructor (See https://github.com/codeplaysoftware/standards-proposals/pull/18)

Improvements

  • Removed dependencies on C headers

Bug fixes

  • Some combinations of OpenCL builtins were not available (select, min/max, vload, ...)
  • cl::sycl::kernel::get_work_group_info<param>(cl_device_id device) now returns the correct value for the platform

Community Edition 0.2.1

New features

  • Implemented a Placeholder Accessor type extension (See CP004 in https://github.com/codeplaysoftware/standards-proposals)

Improvements

  • Converted the stream class delimiters to static const char arrays following SYCL 1.2 specification
  • Update_to/From device now support ranged accessors
  • Host Accessors use map operations on devices where CL_DEVICE_HOST_UNIFIED_MEMORY is cl_true

Bug fixes

  • Removed exception throwing in public header file
  • Various compiler fixes

Community Edition 0.2.0

New features

  • Added support for 16.04 Ubuntu

Improvements

  • Various compiler fixes

Bug fixes

  • Several host fixes for the program class
  • Minor Buffer/Subbuffer error code corrections

Community Edition 0.1.4

New features

  • NA

Improvements

  • computecpp_info now prints a link to platform support notes: https://computecpp.codeplay.com/releases/latest/platform-support-notes.
  • Added CL_DEVICE_VERSION to computecpp_info verbose output.
  • Performance optimizations relating to long running programs with many buffers.
  • Added attributes as presented in section 5.7 of the specification are now handled by the compiler
  • Various compiler fixes

Bug fixes

  • Fixed target parsing for computecpp_info so that now all devices in the system are properly displayed.
  • computecpp_info no longer reports Intel CPU devices without SSE4.1 support as "Tested".

Community Edition 0.1.3

New features

  • Added Codeplay host task API extension (See Standards proposals https://github.com/codeplaysoftware/standards-proposals/blob/master/asynchronous-data-flow/index.md[CE01])
  • Added Codeplay command group API extensions update_from_device and update_to_device (See Standards proposals https://github.com/codeplaysoftware/standards-proposals/blob/master/asynchronous-data-flow/index.md[CE01]). Versions of these methods that take an iterator have not yet been implemented.
  • As requested by various users, the SPIR is not cyphered anymore. This enables writing extra compiler passes that work on top of the SPIR produced by ComputeCpp.

Improvements

  • Improved behaviour when SIGINT is triggered to avoid crashing underlying OpenCL implementation whenever possible
  • Variadic Lenght Arrays (VLA) are now reported as errors in SYCL device kernels
  • computecpp_info now returns error code when error occurs

Bug fixes

  • Default constructor for SYCL event created incorrect OpenCL object.
  • Several device compiler bug-fixes

Community Edition 0.1.2

New features

  • Basic extensions support enabled

Improvements

  • Improved device compiler speed
  • The device compiler will now warn if an undefined function is used inside a SYCL kernel
  • Improved handling of SIGINT
  • Device support status now reported as tested/untested/unsupported in computecpp_info

Bug fixes

  • Fixed issues related to SPIR module conformance
  • Various compiler fixes
  • Fixed 2D atomics
  • Fixed buffer memory leak
  • Resolved an OpenCL related memory leak

Breaking changes

  • Added the info::gl_context_interop flag to the context constructors
  • Removed the --all flag from computecpp_info
  • --sycl-no-diags is now deprecated and will be removed in the next release

Community Edition 0.1.1

  • Various compiler bug fixes after feedback from users
  • The macro __SYCL_DEVICE_ONLY__ is now defined by the device compiler
  • Disallowed call to variadic functions inside SYCL kernels
  • Relying on the OpenCL-platform default local workgroup size when user does not provide a local workgroup side on the parallel for call.

Community Edition 0.1.0

  • Initial Release.

ComputeCpp package

The ComputeCpp package includes the components of the ComputeCpp SYCL implementation. ComputeCpp uses the single source multi-pass techinique, where multiple compilers are providing maximum flexibility in terms of creating host and device binaries from a single SYCL source file. The main components of ComputeCpp are:

  • the SYCL compiler: compute++,
  • the SYCL C++ header: CL/sycl.hpp,
  • the C++ shared library: libComputeCpp.so,
  • the C++ library headers included by : CL/sycl.hpp.

Note: The version of each the components matches the version number of the ComputeCpp package. No component can be used in isolation, independently or without matching the versioning of each of them.

The ComputeCpp package provides the computecpp_info tool for checking the configuration of the target platform.

Compiler

The compute++ compiler takes as input SYCL C++ source code and outputs the ComputeCpp integration header.

compute++ is a clang-based C++ compiler that supports C++11 and partially C++14. The version of compute++ matches the version of the ComputeCpp package.

Documentation of the SYCL compiler, compute++, can be found documentation (local copy can be found at doc/computepp_man.pdf).

Library and headers

The C++ headers and shared library are a work-in-progress implementation of the SYCL runtime, kernel library and SYCL host device implementation. The headers have a single entry point the CL\sycl.hpp header. The shared library that any application needs to link against is lib\libComputeCpp.so.

The macro _COMPUTECPP_ is defined to have the version of the ComputeCpp shared library and headers, which is matching the ComputeCpp package versioning.

The documentation of the ComputeCpp API can be found on the Codeplay developer website (local copy at doc/api_pages/index.html).

computecpp_info tool

The computecpp_info tool makes it possible to check the target platform configuration. It provides information about the OpenCL devices and drivers, and whether they are supported by the ComputeCpp package.

The tool can be found in bin\computecpp_info.

For further documentation please look at link: Platform Support Notes (local copy at doc/computecpp_info_man.pdf).

ComputeCpp implementation notes

All SYCL 1.2.1 features are implemented, although they may not have an optimal implementation for all supported platforms.

Unsupported features

The current version does not support any of the non-core features of the SYCL 1.2 specification.

Unsupported SYCL features:

  • Enabling OpenCL extensions
  • Writting to 3D image memory objects
  • Depth and depth-stencil images
  • No support of SYCL_EXTERNAL. All command-group kernels need to be defined in headers in order to be processed in one translation unit.

ComputeCpp extensions

ComputeCpp supports various extensions through the _codeplay::handler_ class. See https://github.com/codeplaysoftware/standards-proposals for details on the different extensions available.

AMD is a registered trademark of Advanced Micro Devices, Inc. Intel is a trademark of Intel Corporation or its subsidiaries in the U.S. and/or other countries. NVIDIA and CUDA are registered trademarks of NVIDIA Corporation

Sections

    Select a Product

    Please select a product

    ComputeCpp enables developers to integrate parallel computing into applications using SYCL and accelerate code on a wide range of OpenCL devices such as GPUs.

    ComputeSuite for R-Car enables developers to accelerate their applications on a wide range of Renesas R-Car based hardware such as the H3 and V3M, using widely supported open standards such as Khronos SYCL and OpenCL.

    Also,

    part of our network