This section describes the floating-point requirements to support OpenCL though the oneAPI Construction Kit for a new device.
Floating-point is not required to be natively implemented in hardware, support may be emulated to meet conformance. Fast-math mechanisms exposed to developers by the oneAPI Construction Kit allow high performance code to be written using the precision that exists on hardware.
OpenCL C mandates use of IEEE 754-2008 as the floating-point format to be Numerically Compliant for conformance, and therefore a device should support IEEE-754. However, floating-point exceptions are disabled in OpenCL and hardware is not expected to handle these. Other optional parts of the IEEE-754 standard are denormal numbers and fused multiply-add.
If hardware makes use of any IEEE-754 alternative floating-point formats such as bfloat16 or posit, then the ComputeMux Compiler interface can be extended to support these as part of the work done by Codeplay. oneAPI Construction Kit would then expose these types to the user through language extensions, e.g. an OpenCL C extension.
Floating Point Types
OpenCL defines three different floating-point precision formats which can optionally be exposed to developers through ComputeMux Compiler. These are not required internally, but if present in hardware then the oneAPI Construction Kit will make them available.
- Half Precision
16-bit floats, a format sometimes known as fp16. Not required for a device to support OpenCL, this format is optionally available to the user via the cl_khr_fp16 extension.
- Single Precision
32-bit floats, required for devices to support OpenCL.
- Double Precision
64-bit floats, not required for a device to support OpenCL as it is an optional feature.
Hardware only needs to handle the scalar datatypes for these formats. ComputeMux Compiler will handle vector datatypes in the way most applicable to the hardware.
32-bit single-precision floating point is mandatory across all OpenCL profiles unless of device type CL_DEVICE_TYPE_CUSTOM. Custom OpenCL devices are not conformant, as they do not fully support OpenCL C. Integer-only hardware may either report as a customer device or use a software floating-point library in ComputeMux Compiler.
oneAPI Construction Kit tests all three of these IEEE-754 floating-point precisions for OpenCL. As cl_khr_fp16 is an OpenCL extension, the OpenCL Conformance Test Suite doesn’t provide any conformance testing, instead the oneAPI Construction Kit contains its own comprehensive unit testing for the extension.
OpenCL compliance defines precision for the add (
+), subtract (
*), and division (
/) operations on floating-point types.
A software implementation of these is not a feature of the
Abacus library, but a ComputeMux Compiler target may
still implement these in software if hardware is not precise enough. Division
is the most likely candidate for this as it has stricter precision required for
conformance compared to the other operations.
See Floating-point Precision Requirements for the precision requirements of these operations.
Abacus provides a software implementation of all the conversions operations defined in OpenCL, including those taking floating-point types as an input/output. If hardware doesn’t provide native support for conversions then the ComputeMux Compiler implementation may use the Abacus software conversions instead.
Developers can pass options to the kernel code compiler when writing high performance programs to enable floating-point optimizations. In OpenCL, the Optimization Options relating to floating-point behaviour are:
a * b + cto be replaced by a mad instruction. The mad instruction may compute
a * b + cwith reduced accuracy in the embedded profile. On some hardware the mad instruction may provide better performance than the expanded computation.
Allow optimizations for floating-point arithmetic that ignore the signedness of zero. IEEE-754 arithmetic specifies the distinct behavior of
-0.0values, which then prohibits simplification of expressions such as
x + 0.0or
0.0 * x(even with
-cl-finite-math-only). This option implies that the sign of a zero result isn’t significant.
Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid, (b) may violate the IEEE-754 standard, (c) assume relaxed OpenCL numerical compliance requirements as defined in the unsafe math optimization section of the OpenCL C or OpenCL SPIR-V Environment specifications, and (d) may violate edge case behavior in the OpenCL C or OpenCL SPIR-V Environment specifications. This option includes the
Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs, +Inf, -Inf. This option may violate the OpenCL numerical compliance requirements for single precision and double precision floating-point, as well as edge case behavior.
Sets the optimization options
-cl-unsafe-math-optimizations. This option causes the preprocessor macro
__FAST_RELAXED_MATH__to be defined in the OpenCL program.
All these options are passed through the ComputeMux Compiler interface in the oneAPI Construction Kit for a ComputeMux target to optimize as appropriate. It is also possible for the oneAPI Construction Kit to provide new non-standard optimization options to the developer for enabling hardware specific optimizations.
Builtin Maths Functions
Compute languages often define builtin functions for use in kernel code, of particular relevance to floating-point is the domain of builtins relating to mathematical operations on scalar and vector types. These maths builtins have associated precision requirements which must be met for an implementation to be conformant, but this level of precision is not required for high performance code.
Faster, but less accurate maths builtins are also available to the user in in OpenCL for writing high performance code. oneAPI Construction Kit uses these to expose the true hardware capabilities without any overhead for extra precision. A developer can therefore choose the level of maths precision they need for their application, faster native precision or conformant high precision.
The OpenCL single precision math functions contain a set of functions
native_, of implementation-defined accuracy, which can be
used by ComputeMux Compiler for exposing
high-performance device instructions.
oneAPI Construction Kit provides the Abacus maths library which implements OpenCL math functions to specification required precision. This can be used as a software implementation of builtins where hardware isn’t available or does not meet precision requirements.
In SPIR-V the math functions are defined in SPIR-V Extended Maths Instructions as part of the OpenCL extended instruction set.
OpenCL Full Profile ULP
High-level compute languages use profiles to mandate different sets minimum capabilities that a device must support to be conformant. This allows the compute language to be applicable across a range of domains which have each have separate concerns.
The default profile of OpenCL is Full Profile intended for less constrained domains, and as a result the precision requirements of math functions are fairly strict so that OpenCL is applicable to the scientific computing domain.
OpenCL half, single, and double precision math functions all have separate ULP requirements defined in the OpenCL specifications. Single and Double precision errors are defined as separate tables in the main OpenCL C specification under Relative Errors As ULPs. The table for 32-bit single precision is labelled ULP values for single precision built-in math functions, and for 64-bit double labelled ULP values for double precision built-in math functions.
Precision is measured in ULP (Units in Last Place), defined as:
If \(x\) is a real number that lies between two finite consecutive floating-point numbers \(a\) and \(b\), without being equal to one of them, then \(ulp(x) = |b - a|\), otherwise \(ulp(x)\) is the distance between the two non-equal finite floating-point numbers nearest \(x\). Moreover, \(ulp(NaN)\) is \(NaN\).
OpenCL Embedded Profile ULP
OpenCL Embedded Profile targets low-power devices unlikely to be used in the scientific compute domain. Therefore it defines weaker precision requirements than Full profile for 32-bit float, allowing devices to implement faster maths builtins.
These ULP error requirements are also defined in a table under Relative Errors As ULPs, labelled ULP values for the embedded profile.
A ComputeMux device reports the level of support provided for each individual
floating-point format relating to rounding, denormal numbers, and
availability of optimization operations. The following capabilities are
reported by the ComputeMux Runtime for 16-bit, 32-bit,
and 64-bit floats using the
Binary format conforms to the IEEE-754 specification.
IEEE 754-2008 fused multiply-add is supported.
Basic floating-point operations (such as addition, subtraction, multiplication) are implemented in software.
Round To Nearest Even supported.
Round To Nearest Even is the default rounding mode in kernel code.
Round to Zero supported.
Round to Positive Infinity supported.
Round to Negative Infinity supported.
INF and NaNs are supported. Support for signalling NaNs is not required.
Support for denormal (aka subnormal) floating-point numbers.
The Abacus maths library in the oneAPI Construction Kit supports denormal numbers.
oneAPI Construction Kit primarily uses these values to respond to user queries made in high-level languages, but the capabilities are also used to determine whether the device meets any criteria imposed by the high-level language.
The requirements for OpenCL devices not of type CL_DEVICE_TYPE_CUSTOM, for which there are no requirements, are documented in the table below using the equivalent OpenCL capability to those reported by ComputeMux. The table shows that Embedded Profile devices have a reduced set of requirements for single precision floating-point compared to the default Full Profile.