Release Notes
The ComputeCpp™ Community Edition 2.0.0 targets the SYCL™ 1.2.1 Khronos specification. This release provides a conformant edition for OpenCL™ 1.2 host and devices with SPIR™ 1.2 support. ComputeCpp supports AMD® and Intel® OpenCL 1.2 devices with compatible drivers. It also provides the capability of using the system for targeting the SYCL host device and experimental support for NVIDIA® OpenCL 1.2 devices with compatible drivers.
The ComputeCpp SDK complements the ComputeCpp package with build system integration, sample code and documentation. The ComputeCpp SDK is available at computecpp-sdk @ GitHub. A Getting Started document can be found in Getting Started Guide (local copy at doc/computecpp_getting_started.pdf).
Further documentation of the system can also be found in the Codeplay developer website (local copy at doc/api_pages/index.html).
ComputeCpp is SYCL 1.2.1 conformant.
Please, refer to the Platform Support Notes for details on the supported platforms and limitations. Alongside with this document there is a local copy of the supported platforms at doc/computecpp_platform_support_notes.pdf.
Changelog
Community Edition 2.10.0
New features
- Experimental compiler: emit 2D async copies in SPIR-V when the supported builtins are used
- Experimental compiler: extended
-femit-dimension-metadata
and renamed it to-fsycl-emit-kernel-attributes
. The flag emits theKernelAttributesINTEL
SPIR-V capability. In addition to the number of kernel dimensions, it now propagates to SPIR-V whether the SYCL kernel has an offset or not.
Improvements
- Various diagnostic improvements for common errors
- Print out kernel name on mismatch when compiling SYCL 2020
- Report when no devices found with
COMPUTECPP_TARGET
- Package now contains symlinks for
clang
andclang++
(in addition to existingcompute
,compute++
, andcompute-cl
) to the main device compiler executable. - Experimental compiler: Updated to latest LLVM version
- Experimental compiler: Various improvements to the address space inference engine
- Experimental compiler: Faster compilation
SYCL 2020 progress
- Complete support for subgroup algorithms:
- Added
shift_left/right
,any_of
,all_of
,none_of
, shuffles,broadcast
, scan operations, permute operations,select_from_group
- Buffer accessors can be placeholders without an extra template parameter
- Reverse ranged accessor iterators
- Minimal support for kernel bundles
- Defined SYCL implementation macros
SYCL_IMPLEMENTATION_CODEPLAY
,SYCL_IMPLEMENTATION_COMPUTECPP
,SYCL_FEATURE_SET_REDUCED
,SYCL_FEATURE_SET_FULL
(not enabled yet)
Bug Fixes
- Fixed an issue when detecting a subdevice on OpenCLOn12
- Fixed subgroup barrier builtin
- Fix for
vec::operator!
for boolean values - Queue constructor works with child classes of
sycl::device
- Fixed subgroup kernel information queries
- Experimental compiler: Fixed a symbol mangling issue when linking with code compiled with GCC
- Various minor device compiler and runtime fixes
Community Edition 2.9.0
New features
- Support for Codeplay Performance Counters as a performance counter backend (PE feature)
- Experimental compiler: New flag
-femit-dimension-metadata
that propagates the number of SYCL kernel dimensions to the generated SPIR-V
Improvements
FindComputeCpp.cmake
from the ComputeCpp SDK was moved into the package asComputeCppConfig.cmake
- Allows CMake use without having to copy the file from the SDK
- Support for many subgroup builtins in device code
reduce
,load
,store
,broadcast
,all
,any
,scan_inclusive/exclusive
- Experimental compiler: Default
sycl-target
is now SPIR-V instead of SPIR - Experimental compiler: When invoking
compute++
it is no longer necessary to specify the include path or the library path of the ComputeCpp package (though-lComputeCpp
is still required) - Experimental compiler: Use LLVM version 14
- Experimental compiler: Various improvements to the address space inference engine
- Better handling of bitcasts, kernel parameters, function remapping, global variables
- Experimental compiler: Use Itanium mangling for kernel names
- Fixes some issues with duplicated names
- Experimental compiler:
-no-serial-memop
enabled by default
SYCL 2020 progress
- Initial support for subgroup algorithms
joint_reduce
,reduce_over_group
,shift_group_left
,shift_group_right
- Default constructed device and platform use default selector instead of host
queue::copy
shortcutsis_device_copyable
type trait- Not used yet by device compiler
- Improvements to
local_accessor
- It's now default constructable
- Support for unnamed types in kernel names
- In addition to unnamed kernel lambdas
- Allow empty range on kernel submission
- Satisfies dependencies, but doesn't execute kernel
no_init
accessor propertyhandler::require
can be called on any accessor- Deduction guides for host_task accessors
vec::size
andvec::byte_size
clz
andctz
builtins- Added buffer constructors from
shared_ptr<T[]>
- Default to
read_write
/read
mode onbuffer::get_access
- Variadic
buffer::get_access
- Added
atomic_fence
Bug Fixes
- Fixed an issue when releasing unused
host_task
accessors - Fixed signed behavior in
atomic_ref
- Print integral postfix in the integration header
- Various minor device compiler and runtime fixes
Community Edition 2.8.0
New features
- Support for unnamed kernel lambdas (omitting the kernel name)
- Requires the experimental compiler and the
-sycl-driver
switch - Reductions can now use multiple reducers
- Added
sycl::span
- Please note that using
sycl::span
in reductions is experimental at the moment and will change in a future release - Proper iterator support for ranged accessors
- SYCL 2020 access targets
- Support for reversible accessor iterator (but only for contiguous accessors)
- Subscript operator, unary plus and minus for
vec
class operator+
forid
andrange
classes- Minor improvements to the USM interface
- Various other SYCL 2020 features
Improvements
- SYCL 2020
host_task
available as a Codeplay extension in SYCL 1.2.1 mode - Enabled by
COMPUTECPP_EXT_2020_HOST_TASK_ENABLE
macro - When an exception is thrown, the verbose output is now flushed
- Allow implicit conversions to void
multi_ptr
- Deprecated some
usm_allocator
type aliases - Emit line information with compiler errors
CC0002
andCC0003
- Experimental compiler: Improved address space inference to handle more open-source SYCL projects
- Experimental compiler: Significant performance and memory optimizations
- Experimental compiler:
libtinfo
andlibgcc
are no longer dynamic link requirements - Experimental compiler: Add proper suffix to unsigned integers in generated integration header
Bug Fixes
- Fixed ambiguous queue constructor with lambda async handler
- USM memset now internally uses the OpenCL fill function instead of the deprecated OpenCL
memset
function - Improvements to integer legalization to prevent emission of types like
i48
- Experimental compiler: Don't emit invalid intrinsics
- Various minor device compiler and runtime fixes
Community Edition 2.7.0
New features
- More SYCL 2020 features
- Reductions
- Supports buffers and USM
- Limited to single reducer for now
- Requires support for atomic integers on device
- Ranged accessor subscript operator takes into account the access offset (only works for contiguous ranges for now)
atomic_ref
support on the host devicehandler::memset
andqueue::memset
- Support for generic lambdas as
parallel_for
kernels (i.e.auto index
parameters) - Ability to pass
size_t
instead ofrange<1>
toparallel_for
- Deny list for
aspect_selector
(specify device aspects that should be avoided) - Added aspects
emulated
,host_debuggable
,atomic64
,usm_atomic_host_allocations
,usm_atomic_shared_allocations
,usm_system_allocations
- Deprecated some other aspects from the SYCL 2020 provisional
sycl::bit_cast
- Throw error codes from existing runtime errors (but not to all places as required by the spec)
- Experimental support for USM/buffer interop
- Requires
context_bound
anduse_host_ptr
buffer properties - Configuration option for setting the workgroup size of a reduction:
reduction_workgroup_size
- ComputeCpp SYCL extension that marks internal pointers of read-only accessors
as
const
- Enabled by defining
COMPUTECPP_EXT_READ_ACC_CONST_PTR_ENABLE
for the device compiler - Experimental support for a new address space inference system
- Separate package that ships
compute++
based on LLVM 13 - Enables the use of USM without the
usm_wrapper
class
Improvements
- Support
async_work_group_copy
for boolean data - Better support for different data types used in kernel names
- E.g. pointer types can now be used as kernel name template parameters
- Header files have been cleaned up to not raise warnings under the following warning flags:
-Wunused-parameter
,-Wshadow
,-Wreserved-id-macro
,-Wexit-time-destructors
,-Wshadow-field-in-constructor
- Added
buffer::dimensions
member variable
Bug Fixes
- Fixed an issue with the MDd library in the Windows Debug package
- Fixed an issue with device compiler generating invalid integer sizes
- USM
malloc
calls no longer throw when size 0 is passed, just returnnullptr
get_size
on an empty accessor does not segfault anymore- Various minor device compiler and runtime fixes
Community Edition 2.6.0
New features
- Implemented more SYCL 2020 features:
- New device selector API
host_task
usinginterop_handle
(without image support, without host task accessors)backend_traits
(without image support)Container
interface for host and device accessors- Initial support for the new error code mechanism - users can construct and retrieve error codes, but the runtime does not throw them yet
- Property traits
Improvements
- Header files have been cleaned up to not raise warnings under the following warning flags:
-Wdouble-promotion
,-Wdeprecated
,-Wmaybe-uninitialized
,-Wold-style-cast
std::numeric_limits
andstd::hash
support for thehalf
data type
Bug Fixes
async_work_group_copy
now supports more types than before- Various minor device compiler and runtime fixes
Community Edition 2.5.0
New features
- Implemented more SYCL 2020 features:
- Support for Class Template Argument Deduction (CTAD) for accessors.
- Support for the
aspect_selector
factory function. byte_size
andsize
functions replaceget_size
andget_count
, respectively.- Added the
sycl/sycl.hpp
header. id
andrange
classes can now be used in aconstexpr
context when compiling with C++14 or higher. Both classes also supports more than 3 dimensions. Please note that these improvements are a Codeplay extension to the SYCL specification.
Improvements
- Reduced size of device builtins header.
- Refactored
id
andrange
classes provide better memory usage guarantees. - Header files have been cleaned up for certain warning flags:
-Wall
,-Wextra
,-Wundef
,-Wextra-semi
,-Wnewline-eof
Bug Fixes
- Fixes to the Tracy Profiler.
- Device compiler would in some circumstances emit invalid scalar types, this is no longer the case.
- Various minor device compiler and runtime fixes.
- Fixed a potential assert when using the
on_wait
flushing policy.
Community Edition 2.4.0
New features
- Improved kernel binary selection for the Professional Edition. When multiple device binaries are available, the runtime ranks each of them to find the most suitable one.
half
class has been madeconstexpr
when compiling in C++14 or higher mode. Please note that this is a Codeplay extension to the SYCL specification.- Implemented buffer constructors from contiguous containers as a SYCL 2020 feature
- Added
stream_manipulator::flush
Bug Fixes
SYCL_LANGUAGE_VERSION
updated to include the revision number: SYCL 2020 provisional uses version202001
. SettingSYCL_LANGUAGE_VERSION
to2020
still works, ComputeCpp will internally set it to the latest revision it supports.- Fixed an issue when using builtins that take a pointer as an output argument
- Changes to SYCL events fixed an issue where the incorrect event status is reported after a wait
- Fixed a segfault when using the
on_wait
flushing policy - Support
abgr
image channel - Various minor device compiler and runtime fixes.
Known Issues
compute++
will fail to compile on Windows with Visual Studio 16.8.2 when using the toolset MSVC 14.28. To work around this you must use MSVC 14.26. Please see Side-by-side Minor Version MSVC Toolsets in Visual Studio 2019 for information on how to change the toolset for your project. Note that this is not limited to ComputeCpp 2.4.- The Visual Studio template project does not allow changing the toolset for
compute++
targets, thus you must switch to a CMake based solution.
Community Edition 2.3.0
New features
- Experimental support for USM without address spaces
allows explicit USM on some devices with raw pointers,
without having to use
usm_wrapper
. Enabled by the-fno-sycl-address-space
device compiler flag.
Breaking changes
- The PTX backend has been marked as
unsupported
instead ofuntested
. Please see Platform Support Notes for more info.
Bug Fixes
- Removed incorrect check for libstdc++ in computecpp_info.
- Fixed multiple definition issue when linking multiple sources into same program.
- Fixed segfault in spirv-ll-tool when copying elements of certain types.
- Fixed issue with spirv-ll-tool requiring a newer GLIBCXX version than other compiler tools.
- Various minor device compiler and runtime fixes.
Community Edition 2.2.1
New features
- Support for custom profiling zones (Codeplay extension, PE feature)
- Time based sampling of performance counters (PE feature)
Bug Fixes
- Scheduler improvements on program termination
- Various minor device compiler and runtime fixes.
Community Edition 2.2.0
New features
- Several SYCL 2020 features have been implemented (please set
SYCL_LANGUAGE_VERSION
to2020
) - Support for querying the backend used
host
andopencl
supported at the moment
- USM and subgroups no longer require the
experimental
namespace - Device selector global variables
multi_ptr
can act as an iterator- group_barrier functions
- queue shortcuts interface
host_accessor
class- range, id, vec, and buffer deduction guides
- In-order queues
- Querying device and platform aspects
- Terminating async handler
memcpy
andfill
overloads that take a dependency- Tracy profiling support for PE users
- Controlled by
enable_tracy_profiling
configuration option
Bug Fixes
- Various minor device compiler and runtime fixes.
Community Edition 2.1.0
New features
- Added version information and other metadata to
ComputeCpp.dll
- Our USM implementation now aligns with intel's USM specification proposal
- Still requires wrapping pointers into
experimental::usm_wrapper
.
Bug Fixes
- Fixed bug where kernel validation failed when using a custom workgroup size on a device where the maximum workgroup sizes are not uniform across all dimensions.
Community Edition 2.0.0
New features
- Experimental support for Unified Shared Memory (USM)
- Added support for passing
property_list
to the following class constructors: accessor
,context
,program
,sampler
- See https://github.com/KhronosGroup/SYCL-Docs/pull/73
- Improvements to build systems below to enable support for future platforms in preparation for updates to the sycl specification
- Upgraded supported version of Microsoft Visual Studio to 2019
- Toolset version v142
- New templates and updated installer files
- Unified Linux X86 release package:
- Package naming follows industry standard conventions name-version-triple
- Package base version minimum supported libstdc++ 3.4.19 and glibc 2.17
Breaking changes
- Deprecated
codeplay::sub_group
, useexperimental::sub_group
instead. - Deprecated
nd_item::sub_group_barrier
, usesub_group::barrier
instead. - Deprecated
info::queue::queue_profiling
, useinfo::device::queue_profiling
instead. - Mark all
cl::sycl::queue
constructorsexplicit
aside from the copy and move constructors. accessor::operator[]
returns a const reference to the underlying element when in read access mode.- Post version 2.0.0, AMD, NVIDIA, CentOS, and Ubuntu16 will become unsupported platforms. Please get in touch with us at https://support.codeplay.com/ for future support requests.
- Removed support for Visual Studio 2015 and 2017. Only Visual Studio 2019 is now officially supported.
- Name of the ComputeCpp runtime library on Windows changed to
ComputeCpp.dll
Bug Fixes
- Various minor device compiler and runtime fixes.
Community Edition 1.3.0
New features
- ARM Mali public driver support for HiKey 960 devices. User experience may vary dependant on SYCL ecosystem application.
- Configuration option
scheduler_sleep_time_ns
for setting scheduler sleep time (Professional Edition only) - LLVM version updated to 8
Improvement
- Improvements to configuration options
- computecpp_info reports target bitcodes supported by each device
- Minor scheduler improvements
Bug Fixes
- Fixed
cl::sycl::memory_order
not compatible with C++20 - Various minor device compiler and runtime fixes
Community Edition 1.2.0
Improvement
- LLVM version updated to 7
- Improvements to how the scheduler starts and stops worker threads
Bug Fixes
- Various minor device compiler and runtime fixes.
Community Edition 1.1.6
Improvement
- Enqueue index is now double flipped (see blog post for details)
- SYCL alias to cl::sycl is enabled by default on modern compilers
Bug Fixes
- Transaction order is now defended against accidental reordering preventing unexpected scheduler pending list changes.
Community Edition 1.1.5
New features
- Profiling of host OpenCL calls via Professional Edition
- Output iterators can now be used for
set_final_data
Improvement
- Faster profiling (via PE)
- Improvements to asynchronous execution
- Faster
fetch_min
andfetch_max
for host atomics - Various compiler and runtime improvements
- Removed
volatile
qualifiers from atomic
Bug Fixes
- Fixed an address space bug with
std::swap
inside kernels - Various fixes for SYCL event
- Fixed ABI issues with
property_list
in Visual Studio - Fixed issue with nullary kernels not working with the OpenCL interop
Community Edition 1.1.4
Improvement
- Fixed a potential for a resource deadlock on event wait
Bug Fixes
- Queue constructor and worker thread join mechanisms improved to solve issues with global objects being destroyed after main completes.
- Various minor device compiler and runtime fixes.
Community Edition 1.1.3
New features
- Performance counter data extraction for intel GPU Gen 7.5 and up via Professional Edition (contact support.)
Improvement
- Improvements to rectangular accessor-to-accessor copies.
- The target selector now uses platform vendor when the platform name check fails.
computecpp_info
now reportsCL_DEVICE_IL_VERSION
in verbose mode.- Host device now reports local memory type as global.
Bug Fixes
- The size of memory available on the host is not zero anymore.
- Memory checking on submit is now disabled by default.
get_info
strings no longer have an extra null terminator.- Various minor device compiler and runtime fixes.
Community Edition 1.1.2
New features
- Memory checks now implemented for all memory types.
Improvement
- Store allocated memory for different memory types.
- Insufficient memory errors now throw synchronously.
Bug Fixes
- Various minor device compiler and runtime fixes.
Community Edition 1.1.1
New features
- Experimental support for consuming SPIR-V on OpenCL 2.1 devices
Improvement
- Support for
const
type inmulti_ptr
- Extension to track allocated global memory on the SYCL device
- Exposed via
cl::sycl::codeplay::get_allocated_memory_size
Bug Fixes
- Various minor device compiler and runtime fixes.
Community Edition 1.1.0
Improvement
- Reduced overhead of accessors
- Queue no longer waits on instruction anymore
- Multiple placeholder accessor can now be used in the same command group
- Relaxed restrictions on overlapping accessors to improve performance
Bug Fixes
- Fixed issue with sub-buffers on host (reported by users)
- Abacus symbols now present for the Windows debug library
- Build options return empty string when no custom flags are provided
- Various minor device compiler and runtime fixes.
Community Edition 1.0.5
New features
- [DEPRECATION] require() function that takes a buffer and a placeholder accessor will no longer be supported in future releases.
- Added a Codeplay extension handler that features an interop_task() function. This can be used to integrate OpenCL libraries with SYCL code where the SYCL runtime will take care of data availability.
Improvement
- A default constructed placeholder accessor is constructed in a null state.
- Default constructed placeholder accessors can be passed to kernels without being bound to a buffer.
- make_placeholder_accessor() extension member function can be used to create a placeholder accessor from a buffer.
- Placeholder accessor acquired the is_null() function which can be used to query whether a buffer has been bound to it.
Bug Fixes
- Various minor device compiler and runtime fixes.
Community Edition 1.0.4
New features
- Added accelerator_selector.
Improvement
- Default constructed events no longer throw on wait_and_throw.
Bug Fixes
- Rectangular copy now uses correct element sizes.
- Fixes to the reinterpret buffer dependency tracking.
- Various minor device compiler and runtime fixes.
Community Edition 1.0.3
New features
- No-host-access extension added, See CP017 in https://github.com/codeplaysoftware/standards-proposals/blob/master/host_access/index.md
- Builtin-kernel extension added, See CP018 https://github.com/codeplaysoftware/standards-proposals/blob/master/builtin_kernels/sycl-1.2.1/builtin_kernels.md
- On-chip memory extension added, See CP019 in https://github.com/AerialMantis/standards-proposals/blob/CP019/onchip-memory/index.md
- Host-device now support rectangular copies
Improvement
- Invalid host access will now throw for a subset of functions
- Interface between compiler and runtime is now more robust
- Use of OpenCL flags is now based on host_access property
Bug Fixes
- Support for clCreatePRogramWithILKHR for ARM Mali
- Various minor device compiler and runtime fixes.
- Fixed: Event wait_and_throw still forward exceptions to async_handler
- Fixed: Rectangular copies on multi-dimensional accessors
Community Edition 1.0.2
New features
- ComputeCpp runtime makes better usage of clCreateBuffer options and reduces copies whenever possible favouring map/unmap and ALLOC_HOST_PTR | COPY_PTR options.
- Added verbose error build logs
Improvements
- Improved Device selector order, now by device type (accelerator > gpu > cpu > host)
- Allocators are now non-type matching SYCL 1.2.1 rev 3
- computecpp_info tool now displays the available builtin kernels.
- Better runtime detection when the host-device copies can be omitted
Bug Fixes
- Fixed issue with incorrectly reported empty command group error when thrown.
- Various minor device compiler and runtime fixes.
Community Edition 1.0.1
Improvements
- Exception types are now properly reported.
- Improved dependency tracking for reinterpreted buffers. It is no longer necessary to force host access to synchronize reinterpreted buffers.
Bug Fixes
- Fixed issue with reinterpreted buffers overwriting original data
- Various minor device compiler and runtime fixes.
Community Edition 1.0.0
New features
- [SYCL 1.2.1] Runtime now conforms to the SYCL 1.2.1 revision 3 spec
Improvements
- Warnings added for locally defined types
- The runtime does not over-allocate data anymore
- Fixes to tracking data dependencies across reinterpreted buffers
Bug Fixes
- Various minor device compiler and runtime fixes
Community Edition 0.9.1
New features
- [SYCL 1.2.1] Runtime now follows SYCL 1.2.1 revision 3 spec, please note that you will need to change any instances of the following:
- size_t group::get_linear() replaced with:
- size_t group::get_linear_id()
- range
nd_range::get_local()replaced with: - size_t range
nd_range::get_local_range()
- size_t range
- nd_range::get_local(int dimension) replaced with:
- size_t nd_range::get_local_range(int dimension)
- range
nd_range::get_group() replaced with: - range
nd_range::get_group_range()
- range
- size_t nd_range::get_group(int dimension) replaced with:
- size_t nd_range::get_group_range(int dimension)
- range
nd_range::get_global() replaced with: - range
nd_range::get_global_range()
- range
- size_t nd_range::get_global(int dimension) replaced with:
- size_t nd_range::get_global_range(int dimension)
- id
nd_item::get_num_groups() replaced with: - range
nd_item::get_group_range()
- range
- size_t nd_item::get_num_groups(int dimension) replaced with:
- size_t nd_item::get_group_range(int dimension)
- id
nd_item::get_local() replaced with: - size_t nd_item::get_local_id(int dimension)
- size_t nd_item::get_local(int dimension) replaced with:
- size_t nd_item::get_local_id(int dimension)
- id
nd_item::get_global() replaced with: - id
nd_item::get_global_id()
- id
- size_t nd_item::get_global(int dimension)replaced with:
- size_t nd_item::get_global_id(int dimension)
Bug Fixes
- Various minor device compiler and runtime fixes
Community Edition 0.9.0
New features
- Candidate version for SYCL 1.2.1 conformance, all features and API implemented
Improvements
- Clang version bumped to 6.0
- Improvements to scheduler thread locking strategy increasing performance in low-power CPUs.
- Improvements to compiler driver to enable cross-compilation support
Bug Fixes
- Various minor device compiler fixes
- Fixed: Event profile information for kernel can now be queried immediately after a command group submission
- Fixed: Accessor range incorrect when using offsets
Community Edition 0.8.0
New features
- [SYCL 1.2.1] Vector classes fully conformant to the 1.2.1 specification
- [SYCL 1.2.1] Major updates to built-in functions towards conformance 1.2.1 specification
- [SYCL 1.2.1] Stream class changes to match SYCL 1.2.1 specification
- [SYCL 1.2.1] Interop constructor for buffers now match SYCL 1.2.1 specification
- [SYCL 1.2.1] Fully implemented 1.2.1 specification requirements for by-value and by-reference semantics
- [SYCL 1.2.1] Implemented buffer::set_write_back according to SYCL 1.2.1 specification
- [SYCL 1.2.1] Implemented buffer::h_item according to SYCL 1.2.1 specification
Improvements
- Numerous minor runtime fixes
Bug Fixes
- Various minor device compiler fixes
Community Edition 0.7.0
New features
- [SYCL 1.2.1] Stream class changes to match 1.2.1 specification
- [SYCL 1.2.1] is_sub_buffer method added to API
- [SYCL 1.2.1] has_property and get_property support added for public queue interface
- [SYCL 1.2.1] Missing pointer class functions added
- [SYCL 1.2.1] event get_profiling_info implemented
- [SYCL 1.2.1] Swizzled vectors can now be swizzled, this includes:
- Simple swizzles (xyzw(), rgba())
- Index access swizzles from s0() to sF()
- The swizzle template function
- e.g. cl::sycl::float4.xyzw().xxx().template swizzle<:sycl::elem::s0 cl::sycl::elem::s1="">().y();,>
- [SYCL 1.2.1] Vector hi lo odd and even functions are now available for all vector sizes
Improvements
- Switched to a new scheduling policy with less overhead
- Optimised swizzle transformation functions for improved compile times
Bug Fixes
- Race condition in program and kernel cache fixed
- Various minor device compiler fixes
Community Edition 0.6.1
New features
- [SYCL 1.2.1 REMOVALS] Get_access host_buffer and target parameters removals
- [SYCL 1.2.1 DEPRECATIONS] Buffer accessor interface - order of offest and range deprecations.
- [SYCL 1.2.1] Atomic class changes to match 1.2.1 specification
- [SYCL 1.2.1] Exception types implemented for logging
- [SYCL 1.2.1] Introduced support for image array accessors
- [SYCL 1.2.1] Missing pointer class functions added
- [SYCL 1.2.1] Not_Null implemented
- [SYCL 1.2.1] Vector Hi, Lo, Odd, and Even methods for vec
,3>
Improvements
- Minor improvements to scheduler
Bug Fixes
- Exception class now uses size_type
- Various minor device compiler fixes
Community Edition 0.6.0
New features
- [SYCL 1.2.1] Implemented buffer reinterpret: creates a new view of the same buffer with a different underlying type
- [SYCL 1.2.1] Program, kernel, context get_info method now return SYCL objects as per specification
- [SYCL 1.2.1] Updated context and kernel interface to match 1.2.1
- [SYCL 1.2.1] Deprecated Kernel interop constructor in favour of new syntax
- [SYCL-1.2.1] Template argument of various classes is now defaulted to 1
Improvements
- Vectors can now reliably take swizzles on the right hand side of an expression and is not restricted to cl_* alias types
- Added remaining
cl_half
builtins
Bug Fixes
- Various minor device compiler fixes, including 64/32 bit
Community Edition 0.5.1
New feature
- [SYCL 1.2.1] Added functions to group and item classes as per specification
Improvements
- Error message clarification for host out of memory
Bug Fixes
- Various minor device compiler fixes
Community Edition 0.5.0
New feature
- Bumped SYCL interface to SYCL 1.2.1, see the relevant announcement in https://sycl.tech
- Note the interface of most of the classes is now matching the SYCL 1.2.1 specification.
- See the ComputeCpp Implementation Notes for the details on the mismatch between spec and implementation.
- [SYCL 1.2.1] All the copy functions are now SYCL core functionality hence move from the codeplay namespace to cl::sycl
- [SYCL 1.2.1] Buffer/Image properties for use_host_pointer, context_bound and use_mutex
- Note this replaces the map_allocator from SYCL 1.2
- [SYCL 1.2.1] Complete device and host SYCL vector interface (cl::sycl::vec
) ,> - [SYCL 1.2.1] Implemented simple vector swizzles (e.g vec.abc())
- [compute++] New -sycl-target option to specify the target of the binary
- [compute++] Added -fsycl-ih-last option which forces inclusion of the integration header at the end of the translation unit.
Improvements
- Performance improvements for applications using large number of command groups
- Performance improvements when dispatching multiple short-lived kernels to a device
Bug Fixes
- Fixed sporadic deadlock that appeared in TensorFlow test suite
- Fixed multiple bugs on the vector interface
- Various minor device compiler fixes
Community Edition 0.4.0
New features
- New ComputeCpp unified driver to simplify build process (using a single compiler call)
- Added experimental support for PTX generation
Bug fixes
- Accelerator devices are now reported also during device selection
- Various minor device compiler fixes
Community Edition 0.3.3
New features
- Added ARM support for OpenCL SPIR or SPIR-V enabled devices
- The SYCL explicit copy operations on command groups now support fall-back queue
Improvements
- Improved error reporting when kernels fails to execute
- Reduced execution time when multiple command groups are enqueued to the same OpenCL queue
- Reduced overall latency of SYCL command group submission
Bug fixes
- Various bug fixes for command groups with fall-back execution
Community Edition 0.3.2
New features
- Added ranged get_access method to buffer
- Added a method retrieve the get_allocator from the buffer
Improvements
- Reduced the number of OpenCL build program calls
- Improved error message for address-space resolution
- Improved error for placeholder accessor in parameters
- Improved ranged copy functions to and from device
- Improved error reporting when building kernels
Bug fixes
- Various Windows bug fixes
- Added missing const to vstore method
- Size method of range is now marked as const
- Get device method of the queue is now const
- Several bug fixes in the compiler
Community Edition 0.3.1
New features
- N/A
Improvements
- Several improvements on placeholder accessor support
Bug fixes
- Fixed: Copy functions now work with ranged accessors
- Fixed: Copy constructor of placeholder accessor
Community Edition 0.3.0
New features
- compute++ now based on clang/llvm 3.9
- Added support for vload/vstore functions (See CP007 https://github.com/codeplaysoftware/standards-proposals )
- Updated syntax of update_to functions to match latest CP001 status https://github.com/codeplaysoftware/standards-proposals
- ComputeCpp Package supports Visual Studio 2015 and above
- [Experimental] Added SPIR-V generation support
- [Experimental] Added Default placeholder constructor (See https://github.com/codeplaysoftware/standards-proposals/pull/18)
Improvements
- Removed dependencies on C headers
Bug fixes
- Some combinations of OpenCL builtins were not available (select, min/max, vload, ...)
cl::sycl::kernel::get_work_group_info<param>(cl_device_id device)
now returns the correct value for the platform
Community Edition 0.2.1
New features
- Implemented a Placeholder Accessor type extension (See CP004 in https://github.com/codeplaysoftware/standards-proposals)
Improvements
- Converted the stream class delimiters to static const char arrays following SYCL 1.2 specification
- Update_to/From device now support ranged accessors
- Host Accessors use map operations on devices where CL_DEVICE_HOST_UNIFIED_MEMORY is cl_true
Bug fixes
- Removed exception throwing in public header file
- Various compiler fixes
Community Edition 0.2.0
New features
- Added support for 16.04 Ubuntu
Improvements
- Various compiler fixes
Bug fixes
- Several host fixes for the program class
- Minor Buffer/Subbuffer error code corrections
Community Edition 0.1.4
New features
- NA
Improvements
computecpp_info
now prints a link to platform support notes: https://computecpp.codeplay.com/releases/latest/platform-support-notes.- Added
CL_DEVICE_VERSION
tocomputecpp_info
verbose output. - Performance optimizations relating to long running programs with many buffers.
- Added attributes as presented in section 5.7 of the specification are now handled by the compiler
- Various compiler fixes
Bug fixes
- Fixed target parsing for
computecpp_info
so that now all devices in the system are properly displayed. computecpp_info
no longer reports Intel CPU devices without SSE4.1 support as "Tested".
Community Edition 0.1.3
New features
- Added Codeplay host task API extension (See Standards proposals https://github.com/codeplaysoftware/standards-proposals/blob/master/asynchronous-data-flow/index.md[CE01])
- Added Codeplay command group API extensions
update_from_device
andupdate_to_device
(See Standards proposals https://github.com/codeplaysoftware/standards-proposals/blob/master/asynchronous-data-flow/index.md[CE01]). Versions of these methods that take an iterator have not yet been implemented. - As requested by various users, the SPIR is not cyphered anymore. This enables writing extra compiler passes that work on top of the SPIR produced by ComputeCpp.
Improvements
- Improved behaviour when SIGINT is triggered to avoid crashing underlying OpenCL implementation whenever possible
- Variadic Lenght Arrays (VLA) are now reported as errors in SYCL device kernels
computecpp_info
now returns error code when error occurs
Bug fixes
- Default constructor for SYCL event created incorrect OpenCL object.
- Several device compiler bug-fixes
Community Edition 0.1.2
New features
- Basic extensions support enabled
Improvements
- Improved device compiler speed
- The device compiler will now warn if an undefined function is used inside a SYCL kernel
- Improved handling of SIGINT
- Device support status now reported as tested/untested/unsupported in computecpp_info
Bug fixes
- Fixed issues related to SPIR module conformance
- Various compiler fixes
- Fixed 2D atomics
- Fixed buffer memory leak
- Resolved an OpenCL related memory leak
Breaking changes
- Added the
info::gl_context_interop flag
to the context constructors - Removed the
--all
flag from computecpp_info --sycl-no-diags
is now deprecated and will be removed in the next release
Community Edition 0.1.1
- Various compiler bug fixes after feedback from users
- The macro
__SYCL_DEVICE_ONLY__
is now defined by the device compiler - Disallowed call to variadic functions inside SYCL kernels
- Relying on the OpenCL-platform default local workgroup size when user does not provide a local workgroup side on the parallel for call.
Community Edition 0.1.0
- Initial Release.
ComputeCpp package
The ComputeCpp package includes the components of the ComputeCpp SYCL implementation. ComputeCpp uses the single source multi-pass technique, where multiple compilers are providing maximum flexibility in terms of creating host and device binaries from a single SYCL source file. The main components of ComputeCpp are:
- the SYCL compiler: compute++,
- the SYCL C++ header:
CL/sycl.hpp
, - the C++ shared library:
libComputeCpp.so
, - the C++ library headers included by :
CL/sycl.hpp
.
Note: The version of each the components matches the version number of the ComputeCpp package. No component can be used in isolation, independently or without matching the versioning of each of them.
The ComputeCpp package provides the computecpp_info
tool for checking the configuration of the target platform.
Compiler
The compute++ compiler takes as input SYCL C++ source code and outputs the ComputeCpp integration header.
compute++ is a clang-based C++ compiler that supports C++11 and partially C++14. The version of compute++ matches the version of the ComputeCpp package.
Documentation of the SYCL compiler, compute++, can be found documentation (local copy can be found at doc/computepp_man.pdf).
Library and headers
The C++ headers and shared library are a work-in-progress implementation of the SYCL runtime, kernel library and SYCL host device implementation. The headers have a single entry point the CL\sycl.hpp
header. The shared library that any application needs to link against is lib\libComputeCpp.so
.
The macro _COMPUTECPP_
is defined to have the version of the ComputeCpp shared library and headers, which is matching the ComputeCpp package versioning.
The documentation of the ComputeCpp API can be found on the Codeplay developer website (local copy at doc/api_pages/index.html).
computecpp_info tool
The computecpp_info
tool makes it possible to check the target platform configuration. It provides information about the OpenCL devices and drivers, and whether they are supported by the ComputeCpp package.
The tool can be found in bin\computecpp_info
.
For further documentation please look at link: Platform Support Notes (local copy at doc/computecpp_info_man.pdf).
ComputeCpp implementation notes
All SYCL 1.2.1 features are implemented, although they may not have an optimal implementation for all supported platforms.
Unsupported features
The current version does not support any of the non-core features of the SYCL 1.2 specification.
Unsupported SYCL features:
- Enabling OpenCL extensions
- Writting to 3D image memory objects
- Depth and depth-stencil images
- No support of
SYCL_EXTERNAL
. All command-group kernels need to be defined in headers in order to be processed in one translation unit.
ComputeCpp extensions
ComputeCpp supports various extensions through the _codeplay::handler_ class
. See https://github.com/codeplaysoftware/standards-proposals for details on the different extensions available.
AMD is a registered trademark of Advanced Micro Devices, Inc. Intel is a trademark of Intel Corporation or its subsidiaries in the U.S. and/or other countries. NVIDIA and CUDA are registered trademarks of NVIDIA Corporation