The change log contains, with most recent items at the beginning, changes made to the ComputeMux API. Versioning follows the semantic versioning scheme; major version increments signify incompatible API changes, minor version bumps denote the addition of new functionality which is backward compatible, and patch version increases mean backward compatible bug fixes have been applied.
Versions prior to 1.0.0 may contain breaking changes in minor versions as the API is still under development.
0.80.0
Added atomic capabilities.
0.79.0
Added sub-group shuffle builtins.
0.78.0
Added sub-group, work-group, and vector-group operation builtins.
0.77.0
The DMA builtins now permit any event type chosen by the target, as long as they’re consistent across the module.
0.76.0
Added
num_sub_group_sizes
andsub_group_sizes
tomux_device_info_s
.
0.75.0
Snapshots have been removed.
0.74.0
Several
compiler::BaseModule
methods and fields have been reworked to make compilation of OpenCL less stateful. * The class-member lists of macro defs and OpenCL options have been removed,and are set up and torn down on the fly when compiling OpenCL C.
Several methods such as
compiler::BaseModule::populatePPOpts
are nowconst
.compiler::BaseModule::executeOpenCLAction
has been removed.
0.73.0
Added the
muxQueryMaxNumSubGroups
entry point.Removed
mux_kernel_s::sub_group_size
, which is now an implementation detail.
0.72.0
Added the
__mux_mem_barrier
,__mux_work_group_barrier
, and__mux_sub_group_barrier
builtins. They replace the older__mux_global_barrier
,__mux_shared_local_barrier
, and__mux_full_barrier
builtins, which have been removed.
0.71.0
Added the
mux_device_s::supports_generic_address_space
field.
0.70.0
compiler::BaseModule::runBackendPasses
andcompiler::BaseModule::addLatePasses
have been removed. Their functionality are covered by both the newcompiler::BaseModule::getLateTargetPasses
method and the pre-existingcompiler::Module::createBinary
.
0.69.0
Remove
muxWait
and move its functionality inmuxTryWait
whentimeout
isUINT64_MAX
.
0.68.1
Require implementations of
muxCloneCommandBuffer
to deep copy ndrange kernel commands.
0.68.0
Add
timeout
parameter tomuxTryWait
.
0.67.0
Replace
mux_error_command_buffer_not_ready
withmux_fence_not_ready
.Replace
mux_error_command_buffer_failure
withmux_error_fence_failure
.Remove
mux_error_command_buffer_wait_semaphore_failure
.
0.66.0
Rename all builtin functions from
__core
to__mux
.
0.65.0
compiler::BaseTarget
now owns the builtins module optionally created incompiler::BaseTarget::init
and initialized with the target as part ofcompiler::BaseTarget::initWithBuiltins
.
0.64.0
Update
muxDispatch
to accept an optionalmux_fence_t
parameter.Update
muxTryWait
to wait on amux_fence_t
rather than amux_command_buffer_t
.Update
muxWait
to wait on amux_fence_t
rather than amux_command_buffer_t
.
0.63.0
Add
muxCreateFence
,muxDestroyFence
andmuxResetFence
entry points.
0.62.0
Add
mux_sync_point_s
type, representing intra command-buffer synchronization points for ordering commands inside a command-buffer.MuxCommand*
entry-points have been updated to return a sync-point, as well as taking a list of sync-points to wait on.
0.61.0
A new method
getBuiltinCapabilities
has been added tocompiler::Info
. Calling this function will return a bitfield of the builtin capabilities of the device, based on the mux device info.
0.60.0
Add the
muxQueryWFVInfoForLocalSize
entry point.
0.59.0
Add
mux_fence_s
type. There are currently no Mux entry points to create, wait on, query, reset or destorymux_fence_s
objects, these will be added in a future spec version.
0.58.0
BaseModule
has an additional virtual methodcreatePassMachinery()
. This will provide aPassMachinery
which can be used throughout the pipeline to handle state needed for the new pass manager interface.
0.57.0
Added the
mux_device_s::supports_work_group_collectives
field.
0.56.1
Extend valid usage description of
muxUpdateDescriptors
to include text on changing the size of POD descriptors being undefined behaviour.
0.56.0
Add the following entry points: *
compiler::Kernel::querySubGroupSizeForLocalSize
*compiler::Kernel::queryLocalSizeForSubGroupCount
*compiler::kernel::queryMaxSubGroupCount
*muxQuerySubGroupSizeForLocalSize
*muxQueryLocalSizeForSubGroupCount
Remove the following entry point: *
compiler::Kernel::getSubGroupSize()
Add the following fields: *
mux_device_info_s::max_sub_group_count
*mux_kernel_s::max_sub_group_count
Remove the following field: *
mux_device_info_s::max_num_sub_groups
*mux_kernel_s::sub_group_size
0.55.0
Add the
__core_dma_read_3D
builtin.Add the
__core_dma_write_3D
builtin.Modify
__core_dma_read_2D
and__core_dma_write_2D
to handle source and destination strides.
0.54.0
cargo::optional<mux_device_t> device
andmux_allocator_info_t allocator_info
has been removed fromcompiler::Info::createTarget
.compiler::BaseKernel::createSpecializedKernel
has been moved tocompiler::Kernel::createSpecializedKernel
.compiler::Kernel::createMuxSpecializedKernel
was an implementation detail ofcompiler::BaseKernel
which has now been removed.compiler::SpecializedKernel
has been removed.compiler::BaseTarget
now loads the builtins module for the given builtin capabilities as part ofcompiler::BaseTarget::init
. Compiler targets should implementcompiler::BaseTarget::initWithBuiltins
instead. Unlikeinit
,initWithBuiltins
does not need to delegate tocompiler::BaseTarget
first, as it’s a pure virtual function.The notification callback passed to
compiler::Target::init
is now passed tocompiler::Info::createTarget
and is now of typecompiler::NotifyCallbackFn
. This should be passed along tocompiler::BaseTarget
’s constructor.
0.53.2
Change the
user_function
argument ofmuxCommandUserCallback
to use themux_command_user_callback_t
type, rather than the function pointer type explicitly.
0.53.1
Remove note mandating that targets do their own validation of
data
andstride
muxGetQueryPoolResults
parameters.
0.53.0
Add the
uint32_t mux_query_counter_s::hardware_counters
field.Add the
uint32_t mux_device_info_s::max_hardware_counters
field.
0.52.0
Rename member
max_subgroup_size
inmux_device_info_t
tomax_work_width
.Rename member function
getDynamicSubgroupSize
incompiler::Kernel
togetDynamicWorkWidth
.
0.51.0
Added the
__core_get_max_sub_group_size()
builtin.
0.50.0
Version bump to maintain parity with Core which has had the
__core_get_num_sub_groups
builtin added.
0.49.0
Version bump to maintain parity with Core which has had the
__core_get_sub_group_id
builtin added.
0.48.0
Add the
size_t mux_kernel_s::sub_group_size
field.Add the
cargo::expected<uint32_t, Result> compiler::Kernel::getSubGroupSize()
method.
0.47.0
Add the
uint32_t mux_device_info_s::max_num_sub_groups
field.Add the
bool mux_device_info_s::sub_groups_support_ifp
field.
0.46.0
Add member
scalable_vector_support
tocompiler::Info
to represent that the compiler supports generating scalable vector code.Add member
scalable_vectors
tocompiler::Options
to indicate that the executable should be finalized with scalable vectors.
0.45.0
Version bump to maintain parity with Core which has had the
__core_dma_write_2D
and__core_dma_write_2D
builtins added.
0.44.0
Initial release of the ComputeMux specification. The changelog for the Core specification has been duplicated here to preserve history.
Remove the
corePushBarrier
entry point, which was rendered obsolete when command groups were guaranteed to execute in order.
0.43.1
Add
core_source_type_llvm_140
andcore_source_capabilities_llvm_140
for supporting LLVM 14
0.43.0
Add the
coreCloneCommandGroup
entry point.Add the
bool core_device_info_s::can_clone_command_groups
field.
0.42.1
Relax thread-safety requirements of implementing
coreFinalizeCommandGroup()
, so that the entry-point is only thread-safe with respect to the same command-group handle rather than across all invocations.
0.42.0
Add the
coreUpdateDescriptors
entry point.Add the
bool core_device_info_s::descriptors_updatable
field.
0.41.0
Add the
coreFinalizeCommandGroup
entry point.
0.40.3
Add
core_source_type_llvm_130
andcore_source_capabilities_llvm_130
for supporting LLVM version 13.0.0.
0.40.2
Add
core_source_type_llvm_120
andcore_source_capabilities_llvm_120
for supporting LLVM version 12.0.0.
0.40.1
Add the
size_t __core_get_global_linear_id()
builtin.Add the
size_t __core_get_local_linear_id()
builtin.Add the
size_t __core_get_enqueued_local_size(uint)
builtin.
0.40.0
Remove
host_pointer
argument fromcoreAllocateMemory
.Remove
core_allocation_type_use_host
fromcore_allocation_type_e
.Rename
core_allocation_capabilities_e
enumscore_allocation_capabilities_alloc_host
tocore_allocation_capabilities_coherent_host
andcore_allocation_capabilities_use_host
tocore_allocation_capabilities_cached_host
.
0.39.3
Require stricter device capability
core_allocation_capabilities_alloc_host
to support entry pointcoreCreateMemoryFromHost
, as this implies the device architecture has cache coherent memory with host.
0.39.2
Forbid mapping already mapped memory objects with
coreMapMemory
.Specify flushing cache coherent memory as a nop.
Require
core_memory_property_host_visible
as a property of memory objects mapped withcoreMapMemory
.
0.39.1
Add a valid use clarification for
coreCreateSpecializedKernel
.
0.39.0
Add
alignment
argument tocoreAllocateMemory
to specify the minimum alignment for the allocated memory.Add
handle
member tocore_memory_s
to allow the host runtime a way to represent the underlying memory address.Add entry point
coreCreateMemoryFromHost
to allow APIs to create acore_memory_t
device visible object from pre-allocated host memory.
0.38.7
Rename the
core_vectorization_order_e
enum tocore_work_item_order_e
, and the enum values to match thework_item
naming.Rename the
vec_order
field ofcore_executable_options_t
towork_item_order
, to match the rename of-cl-wfv-order
to-cl-wi-order
.Upgrade Guidance:
utils::createHandleBarriersPass()
must now be passed a parameter of typeenum core_work_item_order_e
to specify the work item dimension priority.
0.38.6
Add
core_vectorization_order_e
enum type to represent vectorization priority order.Add
vec_order
field tocore_executable_options_t
struct for supporting the-cl-wfv-order
extension.
0.38.5
Add
core_source_type_llvm_110
andcore_source_capabilities_llvm_110
for supporting LLVM version 11.0.0.
0.38.4
Add documentation for maximum built-in kernel name length.
0.38.3
Add
core_source_type_llvm_100
andcore_source_capabilities_llvm_100
for supporting LLVM version 10.0.0.
0.38.2
Add
__core_usefast()
and__core_isembeddedprofile()
functions as required builtins that core targets must replace.Added
core_floating_point_capabilities_full
flag tocore_floating_point_capabilities_e
for IEEE-754 compliant representations.
0.38.1
Add flags to
core_executable_flags_e
to represent the various OpenCL math optimization build options, namely:core_executable_flags_mad_enable
core_executable_flags_no_signed_zeroes
core_executable_flags_unsafe_math_optimizations
core_executable_flags_finite_math_only
0.38.0
Add
compilation_options
C string tocore_device_info_s
to hold custom build options provided by the device.Add
core_executable_options_t
struct which encapsulates thecore_executable_flags_e
bitfield and a C string for the name and value of any device specific build options passed by the user.Redefine
core_executable_s
struct to have acore_executable_options_t
member rather than thecore_executable_flags_e
bitfield.Redefine
coreCreateBinaryFromSource()
andcoreCreateExecutable()
to take acore_executable_options_t
argument rather than acore_executable_flags_e
bitfield.
0.37.1
Add
core_executable_flags_prevec_loop
andcore_executable_flags_prevec_slp
enum values tocore_executable_flags_e
for activation of “early vectorization” passes:Loop Vectorization
SLP Vectorization
Load/Store Vectorization
0.37.0
Core now accepts 3D descriptions of memory in the
corePush*Region
entry points, these layouts are passed down to the implementation.Reduce the overhead significantly.
Redefine
core_buffer_region_info_s
to describe a buffer in 1D, 2D or 3D. This design is based on OpenCL’sclEnqueue*BufferRect
entry points.
0.36.0
Add support for query counters, extending the mechanism for reporting performance statistics to the application by providing a configurable method for enabling a set of hardware counters alongside metadata which can be used by a profiling visualisation tool to describe the queried data.
Extend
core_query_type_e
to includecore_query_type_counter
.Add
coreGetSupportedQueryCounters()
to enable applications to discover the full list of supported query counters.Add
core_query_counter_t
used to describe how to enable and interpret a query counter.Add
core_query_counter_description_t
used to provide human readable metadata about a query counter.Extend
coreCreateQueryPool
to accept an array ofcore_query_counter_config_t
s to select which query counters to enable and pass through additional target specific counter configuration if necessary.Extend
corePushBeginQuery
/corePushEndQuery
to accept aquery_count
in addition to aquery_index
, this allows multiple queries to be enabled at once.Add
core_query_counter_result_t
used to return the result of a single query counter to the application usingcoreGetQueryPoolResults()
.
0.35.0
Add support for queries, a mechanism for targets to report performance statistics to the application.
The
core_query_pool_t
object is used to store the query results,coreCreateQueryPool()
andcoreDestroyQueryPool()
define the objects lifecycle,coreGetQueryPoolResults()
is used to provide the results to the application.The
core_query_type_e
enumeration defines a set of possible queries, currently onlycore_query_type_duration
is supported and is intended to report the start and end timestamps of a command, results are reported using thecore_query_duration_result_t
object.The
corePushBeginQuery()
andcorePushEndQuery()
entry points define the range of commands for which acore_query_pool_t
is to be used in acore_command_group_t
,corePushResetQueryPool()
is used to zero all query results in the spcified range within thecore_query_pool_t
.
0.34.3
Remove unnecessary member
vectorize
fromcore_kernel_t
.
0.34.2
Fix
core.xml
comment to state thatCL_DEVICE_NAME
is matched withcore_device_info_s::device_name
.
0.34.1
Added
core_source_capabilities_e::core_source_capabilities_llvm_any
bit mask to match any of the LLVM source capability bits.
0.34.0
Add support for custom buffer descriptors, this allows passing through arbitrary data from the user to the Core target in addition to the address space provided by the compiler frontend. This includes:
The
custom_buffer_capabilities
data member ofcore_device_info_s
describing which custom buffer capabilities the Core target supports.The
core_custom_capabilities_e
enumeration of custom buffer capabilities.The
core_descriptor_info_custom_buffer_s
structure to describe the custom buffer to the Core target.The
core_descriptor_info_type_custom_buffer
enumeration value to specify that a descriptor is a custom buffer.
0.33.1
Clarify that whitespace characters other than `` `` are not supported in built-in kernel declarations.
0.33.0
Unify snapshot descriptions to favor snapshot “stages” over snapshot “points”. Rename:
coreListSnapshotPoints
tocoreListSnapshotStages
coreSetSnapshotPoint
tocoreSetSnapshotStage
Specify that passing an invalid snapshot stage name to
coreSetSnapshotStage
must returncore_error_malformed_parameter
.Remove
core_snapshot_type_none
to make it harder to set an invalid format.Rename
core_snapshot_type_e
tocore_snapshot_format_e
to unify how the format information is called and used.Introduce
core_snapshot_format_default
to unify how the format information is used.Re-order the parameters of
coreSetSnapshotStage
, i.e., move thesnapshot_format
parameter before thesnapshot_callback
parameter.
0.32.3
Added built-in kernel usage section to the Core
spec.md
document.
0.32.2
Clarify syntax for built-in kernel declarations.
Clarify that
build_flags
have no effect oncoreCreateExecutable
when the source type iscore_source_type_builtin_kernel
.
0.32.1
Clarify that Core implementations of command groups must not access signal semaphores of completed command groups they depend on.
0.32.0
Add
core_callback_info_t
to support implementations providing detailed messages to users about API usage.Change
<client>CreateFinalizer
to take acore_callback_info_t
parameter to support provision of detailed messages about compilation.Change
<client>CreateCommandGroup
to take acore_callback_info_t
parameter to support provision of detailed messages about command execution.
0.31.4
Clarify the error return codes of
coreCreateExecutable
andcoreCreateBinaryFromSource
for unknown or invalidsource_type
arguments.
0.31.3
Clarify the valid usage of permitted actions in the
user_function
callback ofcoreDispatch
.Clarify when a command group passed to
coreDispatch
is considered complete.
0.31.2
Add allocator validity check to
id.h
and rename it toutils.h
.
0.31.1
Weaken requirement that host-side allocations must use the user provided allocator to that they should use it. This enables use of third-party libraries, like LLVM or the C standard library, which do not support user provided allocators and should not affect existing target implementations.
0.31.0
Supersede
generate_core_header
withadd_core_target
, this also simplifies the mechanism by which targets register themselves and how they specify their capabilities in addition to creating a CMake target to generate the core target header.Add
add_core_cross_compilers
which simplifies the mechanism for registering a targets cross-compilers with thecross
target.
0.30.0
Add requirement that commands in a command group must be executed in the order they were pushed onto the command group, making command groups in-order.
Add addition valid usage requirements for the usage
core_semaphore_t
defining when it can be reset and destroyed relating to the lifetime of acoreDispatch()
.
0.29.2
Changed
builtin_kernel_names
tobuiltin_kernel_declarations
to better represent what information is contained.
0.29.1
Numerous clarifications and inconsistencies corrected in the specification and Doxygen comments of
core.h
.
0.29.0
Add
core_device_type_compiler
tocore_device_type_e
to represent a target which only implements the compilation entry points for use in compiling offline and cross-compiled kernels.Change
core_device_type_e
enumerations to make them usable in a bitfield and addcore_device_type_all
for selecting all device types.Change
coreGetDeviceInfos
to take a bitfield ofcore_device_type_e
in order to selectively initialize only desired devices.
0.28.4
Changed type of
device
member variable incore_finalizer_s
fromcore_device_t
tocore_device_info_t
.
0.28.3
Add
core_source_type_llvm_80
andcore_source_capabilities_llvm_80
for supporting LLVM version 8.0.0.
0.28.2
Add back in the removed
id
member from thecore_device_s
struct to fix compilation failures incoreSelect.h
when multiple targets are registered.
0.28.1
Add support for builtin kernels to core.
Added
core_source_type_unknown
,core_source_type_builtin_kernel
andcore_source_capabilities_builtin_kernel
tocore_source_type_e
andcore_source_capabilities_e
.Added
core_source_type_builtin_kernel
as one of the supported types tocoreCreateExecutable
for creation of acore_executable
with builtin kernels.Reordered values in
core_source_type_e
andcore_source_capabilities_e
.
0.28.0
Changed
coreCreateFinalizer
andcoreDestroyFinalizer
entrypoints to takecore_device_info_t
s instead ofcore_device_t
s.Added a new type
core_binary_t
.Removed
coreGetBinary
and replaced it with a newcoreCreateBinaryFromExecutable
entrypoint.Added
coreCreateBinaryFromSource
entrypoint for offline/cross-compilation support.Added a matching
coreDestroyBinary
to destroy binaries created by the above two functions.
0.27.0
Separate device enumeration from initialization by adding a new structure:
core_device_info_t
, and a new function:coreGetDeviceInfos
.coreCreateDevices
hook API has changed - a new hook forcoreGetDeviceInfos
was added, which has an almost identical interface to the existingcoreCreateDevices
hook.
0.26.1
Add
core_executable_flags_dma_never
andcore_executable_flags_vectorize_never
enum values tocore_executable_flags_e
, so that the core implementations are informed of whether the user chose explicitly to enable/disable these optimizations, or if the default behavior is to be used when neither thenever
noralways
flags are present.
0.26.0
Add member
endianness
tocore_device_t
to represent whether the device is big- or little-endian.
0.25.0
Change to CMake to build only the required builtins based on target capabilities. Capabilities must be reported in a
<target_name>_CAPABILITIES
variable.
0.24.2
Change the CMake mechanism to generate
<client>
API headers, it is now possible to override theclang-format
executable used during header generation.
0.24.1
Change references to
command_buffer
in Doxygen documentation and parameter variable names tocommand_group
.
0.24.0
Add member
dma_optimizable
tocore_device_t
to represent that DMA optimizations can be performed for this device.Add
core_executable_flags_dma_always
tocore_executable_flags_e
to represent that DMA optimizations must be performed.
0.23.0
Add a new command
<client>ResetSemaphore()
to reset a semaphore such that it has no previous signalled state.
0.22.5
Add member
image2d_array_writes
tocore_device_t
.
0.22.4
Add member
integer_capabilities
tocore_device_t
.Add enum
core_integer_capabilities_e
.
0.22.3
Add member
vectorizable
tocore_device_t
to represent that vectorization can be performed for this device.Add member
vectorize
tocore_kernel_t
.Add
core_executable_flags_vectorize_always
tocore_executable_flags_e
to represent that vectorization must be performed.
0.22.2
Add
core_executable_flags_denorms_may_be_zero
tocore_executable_flags_e
to represent that denormal floats may be flushed to zero.
0.22.1
Added member
local_memory_size
tocore_kernel_t
.
0.22.0
Add a new command
<client>PushBarrier()
to enforce the execution order of commands within a command group.
0.21.0
Add a
core_finalizer_t
argument to<client>DestroyExecutable()
,<client>DestroyKernel()
and<client>DestroyScheduledKernel()
. Note that<client>DestroySpecializedKernel()
does not take acore_finalizer_t
.
0.20.5
Add
core_source_type_llvm_70
andcore_source_capabilities_llvm_70
for supporting LLVM version 7.0.0.
0.20.4
Remove dead symbol references in Doxygen documentation.
0.20.3
Add
allocation_size
tocore_device_s
to represent the maximum size of a single memory allocation.
0.20.2
Add
__core_get_work_dim()
,__core_get_group_id()
,__core_get_global_id()
,__core_get_local_id()
,__core_get_num_groups()
,__core_get_global_size()
,__core_get_local_size()
,__core_get_global_offset()
,__core_full_barrier()
,__core_shared_local_barrier()
, and__core_global_barrier()
, required builtins that core targets must replace.
0.20.1
Add
core_source_type_llvm_60
andcore_source_capabilities_llvm_60
for supporting the latest version of LLVM.
0.20.0
Add
<client>PushReadBufferRegions()
to allow for multiple regions within a source buffer to be copied to a destination host pointer.Add
<client>WriteCopyBufferRegions()
to allow for multiple regions within a host pointer to be copied to a destination buffer.Add
<client>PushCopyBufferRegions()
to allow for multiple regions within a source buffer to be copied to a destination buffer.Add
core_buffer_regions_info_s
as a helper struct to specify to the new entry points above what source offset, destination offset, and size to use for each region.
0.19.2
Add
max_subgroup_size
tocore_device_s
to represent the maximum subgroup size for kernels on a device, anddynamic_subgroup_size
tocore_scheduled_kernel_s
to represent the actual subgroup size for that scheduled kernel.
0.19.1
Add
core_source_type_llvm_50
tocore_source_flags_e
to allow input binaries to be from LLVM 5.0.Add
core_source_capabilities_llvm_50
tocore_source_capabilities_e
to allow input binaries to be from LLVM 5.0.
0.19.0
Add
core_device_t
argument to create entry points which were not already passed a device making the API consistent across all create and destroy functions.
0.18.1
Add
__core_dma_read_1d()
,__core_dma_read_2d()
, and__core_dma_wait()
functions as builtins that core targets must replace if they use the automatic DMA.
0.18.0
Add
core_allocator_info
argument to all entry points which perform host allocations to support Vulkan style user allocator override.Change order of entry points so that
<client>Create<Object>
is directly before<client>Destroy<Object>
.
0.17.3
Add
compute_units
tocore_device_s
to let implementations pass information on how many compute units their device has.
0.17.2
Add
device_priority
tocore_device_s
. This is used to keep track of device priorities when returning default devices.
0.17.1
Add
__core_isftz()
function as a required builtin that core targets must replace.
0.17.0
Add support for multiple memory heaps.
Add
supported_heaps
bitfield tocore_memory_requirements_s
allowing the client target to state which heaps are supported for a specific buffer or image.Change
core_buffer_t
to have amemory_requirements
data member, replacingsize
and adding support for specifyingalignment
andsupported_heaps
.Add
heap
argument to<client>AllocateMemory
to specify the heap to allocate memory from.
0.16.0
Added
native_vector_width
andpreferred_vector_width
tocore_device_t
to let devices expose what vector width (in bytes) their hardware is, and what size of vectors they would prefer implementations give them.
0.15.0
Added
preferred_local_size_x
,preferred_local_size_y
, andpreferred_local_size_z
tocore_kernel_t
to let implementations pass information on what would be a suitable local work group size to use for a given kernel.
0.14.0
Removed
<client>PushTerminate()
as it put a higher burden on client targets than was necessary.
0.13.0
Add
<client>GetBinary()
to retrieve the binary representation of acore_executable_t
.Add
core_source_type_binary
tocore_source_flags_e
to allow the input to be a binary for the given core target.Add
core_source_capabilities_binary
tocore_source_capabilities_e
to allow a core target to advertise it can support creating executables from binaries.Rename
<client>CreateQueue()
to<client>GetQueue()
and change the function signature to take two extra parameters for the queue type and index.core_queue_t
‘s now belong to the device, and are queried from the device, rather than an arbitrary number of them being created (which simplifies the engineering effort required by our customers).Add new enum
core_queue_type_e
to denote all possible types of queue we can support - at present this only containscore_queue_type_compute
, but is available for extension later.Add new field to
core_device_t
to query the number of queues of eachcore_queue_type_e
a device supports.Remove
<client>DestroyQueue()
, as queues are now implicitly destroyed when the device they were retrieved from is destroyed.
0.12.4
Fix bug in
core::util::allocator::create
where references were not correctly passed through to the constructor of the object being created.
0.12.3
Add
core_source_type_llvm_40
tocore_source_flags_e
to allow input binaries to be from LLVM 4.0.Add
core_source_capabilities_llvm_40
tocore_source_capabilities_e
to allow input binaries to be from LLVM 4.0.
0.12.2
Add
core_executable_flags_no_opt
tocore_executable_flags_e
.Change semantics of
core_executable_flags_debug
to mean built with debug info.
0.12.1
Add
core_executable_flags_soft_math
tocore_executable_flags_e
to force finalization to occur using software math builtins.
0.12.0
Add
max_work_group_size_x
,max_work_group_size_y
andmax_work_group_size_z
tocore_device_t
.
0.11.1
Add
CORE_NULL_ID
preprocessor definition to be used by clients when initializingcore_<object>_s::id
.
0.11.0
Add ID types
core_id_t
,core_object_id_t
,core_target_id_t
.Generate
core_target_id_e
enum incore/coreConfig.h
from list of registered targets.Add
core_id_t id
member to all objects created by clients.Add missing
core_device_t
parameter to<client>ListSnapshotPoints
.Add
core/util/id.h
utility header for working with object ID’s.
0.10.0
Added
builtins_type
,builtins
, andbuiltins_length
parameters to<client>CreateFinalizer()
to pass the compute APIs standard library to the core client target for linking. Client targets must now link in the builtin function definitions themselves to use our provided implementations. By moving the responsibility for linking to the client target, clients now have a mechanism to intercept any of the builtin functions with target specific optimizations, before linking in any remaining builtins that the client does not have optimized support for.
0.9.0
Remove no longer required
page_size
fromcore_device_t
.Renamed
core_descriptor_info_shared_scratch_s
tocore_descriptor_info_shared_local_buffer_s
to be more consistent with our naming.Renamed
core_descriptor_info_type_shared_scratch
tocore_descriptor_info_type_shared_local_buffer
to be more consistent with our naming.
0.8.1
Add overload to
core::allocator::alloc()
which takes a non-template alignment parameter.
0.8.0
Add
image3d_writes
flag tocore_device_s
to signify support for writing to 3D images.
0.7.0
Add
<client>FlushMappedMemoryToDevice()
to synchronize device memory with data currently residing in host memory.Add
<client>FlushMappedMemoryFromDevice()
to synchronize host memory with data currently residing in device memory.Remove
flags
parameter tocoreMapMemory()
, use<client>FlushMappedMemoryToDevice()
and<client>FlushMappedMemoryFromDevice()
to perform flushing instead.Remove
core_mapping_type_e
,coreMapMemory()
andcoreUnmapMemory()
are no longer required to synchronize memory.
0.6.2
Remove
max_instructions_issued_per_cycle
fromcore_device_s
as it is no longer a required (or useful) piece of functionality to require our customers to guestimate.
0.6.1
Change
core_source_type_e
andcore_source_capabilities_e
to be the LLVM version of the bitcode module being passed in (which more correctly fits our usage).LLVM bitcode modules being passed in with
core_source_type_llvm_38
andcore_source_type_llvm_39
must have the “unknown-unknown-unknown” target triple now.
0.6.0
Add function
<client>ListSnapshotPoints
to retrieve the list of compilation stages snapshots can be taken at in partner code.Add function
<client>SetSnapshotPoint
to set a snapshot point in partner code.Add enum
core_snapshot_type_e
to describe snapshot formats.Add typedef
core_snapshot_callback_t
to describe the function prototype for the callback invoked when a snapshot point is hit.
0.5.0
Add struct
core_semaphore_s
representing a device semaphore object.Add function
<client>CreateSemaphore
to create device semaphore objects.Add function
<client>DestroySemaphore
to destroy device semaphore objects.Add function
<client>TryWait
to try and wait on command groups.Change
<client>Dispatch
to include two arrays of semaphores, one to wait on before beginning execution of the command group, and one to signal when the command group has completed executing.Change
<client>Dispatch
to include a command group complete callback and user data.Add
core_error_command_group_failure
tocore_error_e
enum to signal that a command group that was waited on failed.Add
core_error_command_group_wait_semaphore_failure
tocore_error_e
enum to signal that a command group that was waiting on another command group via a semaphore failed because the other command group failed.Add
core_error_command_group_not_ready
tocore_error_e
enum to signal that a command group that was waited on was not yet complete.Add extra parameter to
<client>PushFillImage
to specify the size of the user memory being passed in as the color parameter.Add function
<client>PushTerminate
to signal that a command group should terminate, and any semaphore in the chain of waits on it, should not execute.Add function
<client>ResetCommandGroup
to reset a command group such that it has no previous commands enqueued within it.
0.4.0
Add struct
core_image_s
representing a device image object.And struct
core_sampler_s
representing a device sampler object.Update struct
core_device_s
to contain the devices image capabilities.Change enum
core_memory_type_e
intocore_memory_property_e
to describe the desired memory properties for an allocation,core_memory_type_e
was too restrictive and did not allow implementation ofCL_MEM_OBJECT_IMAGE1D_BUFFER
.Add struct
core_memory_requirements_s
to describe the device memory allocation requirements of acore_buffer_t
or acore_image_t
.Add struct
core_offset_3d_t
to describe the offset into an image.Add struct
core_extent_3d_t
to describe the region of an image.Add enum
core_image_type_e
to describe the type of an image.Add enum
core_image_format_e
to describe the format on an image.Add enum values
core_descriptor_info_type_image
andcore_descriptor_info_type_sampler
tocore_descriptor_info_type_e
.Add enum
core_address_mode_e
to describe sampler addressing modes.Add enum
core_filter_mode_e
to describe sampler filter modes.Change
<client>AllocateMemory
to accept a bitfield ofcore_memory_property_e
Add function
<client>CreateImage
to create device image objects.Add function
<client>DestroyImage
to destroy device image objects.Add function
<client>BindImageMemory
to bind device memory to an image object.Add function
<client>GetSupportedImageFormats
to query the device for supported image formats.Add function
<client>PushReadImage
to read an image in a command group.Add function
<client>PushWriteImage
to write an image in a command group.Add function
<client>PushFillImage
to fill an image in a command group.Add function
<client>PushCopyImage
to copy and image to another in a command group.Add function
<client>PushCopyImageToBuffer
to copy an image to a buffer in a command group.Add function
<client>PushCopyBufferToImage
to copy a buffer to an image in a command group.
0.3.1
Fixed
core_memory_type_e
- it should have been a bitfield.Fixed core.h C compilation issue (enum types are called
enum <type>
).
0.3.0
Added enum
core_executable_flags_e
for build flags.Added
build_flags
field to executable representing compilation/linking options set for the module.Added
build_flags
parameter to function<client>CreateExecutable
.
0.2.0
Add handle
core_memory_t
to take sole ownership of device memory allocations in preparation for image support.Add struct
core_memory_s
.Add functions
<client>AllocateMemory
and<client>FreeMemory
to handle device memory allocations.Add function
<client>BindBufferMemory
to associate a device memory allocation with a buffer object. This also adds first class support to the API forclCreateSubBuffer
.Add enum
core_memory_type_e
used to specify if an allocation should support buffers, images, or both buffers and images. Add typedef to the definition to allow passing as a function parameter.Combine
core_buffer_mapping_type_e
andcore_buffer_unmapping_type_e
and rename the enum tocore_mapping_type_e
. Add typedef to definition to allow passing as a function parameter.Simplify function
<client>CreateBuffer
to remove allocation specific parameters.Add
core_device_t
parameter to function<client>DestroyBuffer
.Remove functions
<client>MapBuffer
and<client>UnmapBuffer
, this functionality now applies tocore_memory_t
allocations.Add functions
<client>MapMemory
and<client>UnmapMemory
replacing the buffer specific variety.Remove member
device
from structcore_buffer_s
,device
is now passed to API functions instead.
0.1.3
Fix documentation for API function
<client>CreateSpecializedKernel
.
0.1.2
Removed
CORE_DEVICE_KHRONOS_CODEPLAY_ID
andCORE_DEVICE_KHRONOS_CODEPLAY_NAME
as they are specific to the Codeplay backends.Added enum
core_floating_point_capabilities_e
for floating point support.Added
half_capabilities
to device for what half floating point mode is supported.Added
float_capabilities
to device for what floating point mode is supported.Added
double_capabilities
to device for what double floating point mode is supported.Added enum
core_shared_local_memory_type_e
for local memory types.Added
shared_local_memory_type
to device for the type of shared local memory the device supports.Added
shared_local_memory_size
to device for the size of the shared local memory the device has.
0.1.1
Added enum
core_cache_capabilities_e
for read/write caching.Added
cache_capabilities
field to device for what caching is supported.Added
cache_size
field to device for the size of the cache supported.Added
cacheline_size
field to device for the length of a line within the cache.
0.1.0
Replace
<client>_hook
with<client>CreateDevices
, adding support for multiple devices per target.
0.0.0
Add version to XML schema and generated headers.
Add compile time check for matching versions of all registered targets.