Intermediate Representation

ComputeMux assumes LLVM IR as its intermediate representation. The precise specification of the intermediate representation is dependent on the version of LLVM ComputeMux has been built against.

While ComputeMux itself does not mandate a specific producer of the IR it consumes, it is assumed to meet certain requirements, e.g., that it comes from a supported higher-level form, including – but not limited to – a tool converting SPIR-V to LLVM IR or a clang compiler compiling OpenCL C or SYCL for the SPIR “triple”.

This section describes the various conventions on IR that a ComputeMux compiler may encounter and must support during code generation. For a list of language features a compiler is not required to support, see ComputeMux Compiler Non-Requirements.

Types

link

The following tables describe the ComputeMux “built-in” scalar and vector types and how they are mapped to the LLVM IR type system. They stem from the higher-level languages supported by ComputeMux, including OpenCL C and SPIR-V.

Built-in Scalar Types:

OpenCL C types	LLVM IR type
`bool`	`i1`
(`unsigned`) `char`	`i8`
(`unsigned`) `short`	`i16`
(`unsigned`) `int`	`i32`
(`unsigned`) `long` 1	`i64`
`float`	`float`
`double` 2	`double`
`half` 3	`half`

Built-in Vector Types:

OpenCL C type	LLVM IR type
`bool`n 4	`<n x i1>`
[`u`]`char`n	`<n x i8>`
[`u`]`short`n	`<n x i16>`
[`u`]`int`n	`<n x i32>`
[`u`]`long`n 1	`<n x i64>`
`float`n	`<n x float>`
`double`n 2	`<n x double>`
`half`n 3	`<n x half>`

Footnotes

1(1,2): An optional type. See Optional 64-bit integer support
2(1,2): An optional type. See Optional 64-bit double support
3(1,2): An optional type. See Optional 16-bit half support
4: A reserved data type in OpenCL C, but one that is common in LLVM IR

Aside from optional types, which have specific conditions detailed below, the aforementioned built-in types must be supported by a conforming ComputeMux compiler implementation. Note that “support” in this context does not mean requiring hardware for all built-in types; exactly how types are supported is ultimately left to the compiler implementation but it is assumed that a ComputeMux compiler can legalize any unsupported types.

Tip

Compilers targeting architectures which lack hardware support for certain built-in types may typically rely on LLVM’s built-in SelectionDAG-based legalization framework to handle illegal types and operations during instruction selection.

Illegal vector types are usually scalarized or “unrolled”. Illegal integer types are usually either extended to larger integers or split into multiple operations over smaller ones. Illegal floating-point types typically rely on the external LLVM compiler support library compiler-rt.

Compilers may customize this legalization process if there is a better approach for their target architecture.

A brief overview of the instruction selection process is described here.

Other types

link

The built-in types have been detailed above only to describe their relative importance and are not the only types a ComputeMux compiler may encounter as part of a conformant implementation. Pointer types, the void type, array types, structure types and union types must all be correctly handled.

Aside from both scalar and vector double and half types, ComputeMux makes no guarantees about the presence or absence of operations on non-built-in integer, floating-point or vector types in the IR.

For example, if whole-function vectorization is enabled, operations on vector types wider than 16 may be introduced into the IR. LLVM itself has been known to turn double2 (<2 x double>) vectors into scalar 128-bit integers (i128), and may reduce switch statement values into arithmetic over 4-bit integers (i4).

Note

Both ComputeMux and LLVM’s middle-end optimizations may introduce “illegal” types into the module. The concept of “legal types” is not generally observed at that level. For instance, some LLVM intrinsics only take i64 parameters. The reason for this, as mentioned above, is that the contract on an LLVM backend is that it should always be able legalize the standard operations and intrinsics given any type.

Optional 64-bit integer support

link

For targets implementing the embedded profile without supporting the optional 64-bit integer types (cles_khr_int64), operations on 64-bit integers may still be present in the IR.

Optional 16-bit half support

link

Targets without support for the optional half type (cl_khr_fp16) shall not see arithmetic, conversion or relational operations on neither scalar nor vector half types anywhere in the IR module.

Note

Since half can always be used as a storage format, other operations and built-in functions may operate on half values, such as loads and stores.

Optional 64-bit double support

link

Targets without support for the optional double type shall not see neither scalar nor vector double types anywhere in the IR module.

Vector Types and Whole-Function Vectorization

link

If WFV is enabled, the vectorizer may emit vectors wider than the requested vectorization factor, if vectors already exist in the incoming IR. The vectorizer may emit vectors wider than 16, if the target reports that it has vector registers wide enough to contain such a wider type.

The vectorizer is able to produce scalable vectors (being a vector of the form <vscale x n x Ty> where n is a constant factor known at compile-time, and vscale being a hardware-dependent factor that can be queried at runtime) on requesting a scalable vectorization factor. Support for scalable vectors is not a requirement. Scalable vectors are available starting from LLVM 12, but LLVM 13 is recommended since support in LLVM 12 is very limited.

Integer and Floating-point Operations

link

The LLVM IR module shall contain standard built-in IR instructions which may form some proportion of the “compute” operations in the kernel being compiled. That is to say the kernel may not be entirely comprised of library functions and compiler intrinsics. The specific language features which produce these built-in instructions (e.g., operators: see below) depend on a variety of factors: the compute standard being implemented; the compile options; the compiler frontend; the version of LLVM being used; any higher-level intermediate representation being lowered to LLVM IR.

These standard LLVM instructions each have their own semantics as described in the LLVM language reference manual. ComputeMux shall not infer any contradictory semantics about these instructions based upon the compute standard it is compiling for. ComputeMux shall fundamentally act as a well-behaving LLVM compiler. Additional requirements may be placed on top of these instructions depending on the compute standard being implemented, such as Floating-point Precision Requirements.

Tip

When implementing the OpenCL compute standard in conjunction with the clang frontend, all built-in operators shall be emitted into the IR as standard LLVM IR instructions. This includes add (+), subtract (-), multiply (*), divide (/), unary operators (+, -), relational operators (<, >=, etc.), equality operators (==, !=), logical operators (&&, ||), ternary selection (?:), and unary logical not (!) for all built-in scalar and vector types. Additional operators – for integer types only – include remainder (%), shift (<<, >>), pre- and post-increment (--, ++), bitwise and (&), bitwise or (|), bitwise exclusive or (^), and bitwise not (~).

There is no guarantee that a given operator is mapped to a specific IR instruction. While the naive lowering of subtraction (-) to LLVM’s sub is likely, it is also possible for transformations and optimizations to change the sub of a constant to an add of the negative constant, for example. A ternary expression may be identified as a min/max operation and represented by an intrinsic, e.g., llvm.umin.*.

See OpenCL C Operators for a full list of the operators described above.

Presented here is an incomplete list of instructions which often arise from the compute standards supported by ComputeMux and which targets shall be expected to handle for all supported scalar and vector built-in Types: the binary operations including integer add, sub, mul, udiv, sdiv, urem, sdiv; floating-point fadd, fsub, fmul, fdiv, and frem; the bitwise binary operations and, or, xor, shl, lshr, ashr. Relational and equality operations are commonly lowered as icmp or fcmp instructions.

Important

The four basic floating-point operators – commonly rendered into LLVM IR as fadd, fsub, fmul, and fdiv instructions – are special in that ComputeMux does not provide software implementations to help achieve precision requirements. See here for more information. An example of the precision requirements that OpenCL conformance places upon these four operators is given below.

Tip

Targets without hardware support for the fifth operator frem may wish to replace all such instructions with calls to the OpenCL built-in fmod function or the Abacus equivalent, which both have overloads for all OpenCL built-in vector Types. This may be preferable to LLVM’s standard expansion of the instruction which may scalarize vector types.

Floating-point Precision Requirements

link

The LLVM IR instructions will have precision requirements that vary according to the higher-level programming language being implemented. These precision requirements must be adhered to for conformance, but note that there may be profiles, compiler options, or specific maths library functions (e.g., native functions in OpenCL) to relax these requirements when used in performance-sensitive code.

OpenCL Conformance

link

The OpenCL full-profile precision requirements for floating-point arithmetic operations (for both scalar and vector) are summarized in the table below.

Note

This table is a summary of requirements for the full profile. The embedded profile and/or presence of certain compilation options may loosen these requirements. See Relative Error as ULPs for the definition of ULP and precision requirements for other profiles, compilation flags, and other built-in floating-point operations. See the same section in the cl_khr_fp16 documentation for the precision requirements on half data types.

Operator	Min accuracy - ULP values
Operator	Single-precision	Double-precision 5	Half-precision 5
`fadd`	Correctly rounded	Correctly rounded	Correctly rounded
`fsub`	Correctly rounded	Correctly rounded	Correctly rounded
`fmul`	Correctly rounded	Correctly rounded	Correctly rounded
`fdiv`	<= 2.5 ulp	Correctly rounded	<= 1 ulp

Footnotes

5(1,2): Support for both half and double is optional

Intrinsics

link

LLVM includes the notion of intrinsic functions which serve to extend the capabilities of stock LLVM IR. ComputeMux provides a number of points at which intrinsics may be emitted into the IR.

Some intrinsics may be immediately present in the IR consumed by ComputeMux, either generated by the compiler frontend or created during conversion from a higher-level intermediate representation such as SPIR-V.

The standard ComputeMux pass pipeline includes several standard LLVM optimizations such as InstCombine which are known to introduce intrinsics into the IR module. This pass pipeline is ultimately configurable and so passes can be added, removed or reordered as required. Doing so may introduce or remove unspecified intrinsics into the IR module.

Tip

A compiler backend can usually rely on LLVM’s standard legalization framework to expand any unsupported intrinsics into a set of supported operations. If an architecture has native support for any intrinsics then custom code will be required to ‘lower’ them to target-specific instruction sequences.

If ComputeMux’s whole-function vectorizer (WFV) pass is enabled, the vectorizer may emit the vector reduction intrinsics @llvm.vector.reduce.and.* or @llvm.vector.reduce.or.*. Up to and including LLVM 11, if these are not supported by the target, alternative code will be generated, so supporting these is not a requirement. As of LLVM 12, these intrinsics may be emitted regardless of the target capabilities.

If WFV is enabled, load instructions may be vectorized to the intrinsics @llvm.masked.load.* or @llvm.masked.gather.*. Store instructions may be vectorized to the intrinsics @llvm.masked.store.* or @llvm.masked.scatter.*. These are emitted through an externally-exposed interface, so the default behaviour may be overridden by a target-specific implementation, if required.

If WFV is enabled, certain operations on scalable vectors other than loads and stores shall use @llvm.masked.scatter.* and @llvm.masked.gather.* intrinsics.

If WFV is enabled, the vectorizer may emit a number of other target-independent intrinsics commonly generated by LLVM’s middle-end optimizations (e.g., @llvm.maxnum, @llvm.fshr, etc.), but only if they exist in the incoming IR.

Address Spaces

link

In LLVM IR, pointer types are considered as pointing to a particular “address space” denoted by an integral value, e.g., i32 addrspace(2)*. Address spaces conceptually describe different regions of memory which may not necessarily be uniformly addressable. The default address space is the number zero and the semantics of non-zero address spaces are target-specific. In LLVM’s type system, two otherwise equivalent pointer types are unequal if they point to different address spaces and may not be used interchangeably.

See Address Spaces for a conceptual overview of the address spaces ComputeMux recognizes and how these address spaces may be mapped to hardware.

In LLVM IR, ComputeMux maps these address spaces to the numbers 0-4:

Address Space	OpenCL	SPIR-V
0	private	Function
		Private
		AtomicCounter
		Input
		Output
1	global	Uniform
		CrossWorkgroup
		Image
		StorageBuffer
2	constant	UniformConstant
2	constant	PushConstant
3	local	Workgroup
4	generic	Generic

Targets shall not use these address space numbers for any other purpose.

Note

The conventions in LLVM surrounding address spaces 0-3 stem from the SPIR 1.2 specification.

https://www.khronos.org/registry/SPIR/specs/spir_spec-1.2.pdf

Targets may use any of the unused address space numbers for their own purposes. It is recommended that address space numbers above 100 are used to better accommodate future specifications.

ComputeMux shall not make assumptions about how address spaces map to the target architecture and how conversions between them behave. Conversions between address spaces in the IR shall be preserved. Targets shall expect that addrspacecast instructions may occur in programs in all supported versions of LLVM. It is up to the target to lower these instructions accordingly.

Tip

It is common that a ComputeMux compiler backend is targeting an architecture where not all address spaces are distinct memory regions, e.g., one with a unified address space, or more generally one for which a given pair of address spaces 0 to 4 are known to share the same addressable region.

In this instance, it is recommended that the distinct address spaces are maintained on pointer types. An LLVM compiler backend does not care for particular address spaces and they should not interfere with any optimizations.

A common sticking point is in fact the addrspacecast instructions which hit the instruction selector and expect a lowering to target-specific instructions. The target may override TargetMachine::isNoopAddrSpaceCast. This method allows LLVM to automatically elide addrspacecast instructions during instruction selection.

Alignment

link

As is required by the semantics of LLVM IR, a target must adhere to the alignments specified on operations. These are most commonly found as either the align argument (or !align metadata) on instructions including alloca, load, store, etc., or the alignment parameter on intrinsics such as @llvm.masked.gather.* and @llvm.masked.scatter.*.

ComputeMux

link

ComputeMux shall preserve any existing alignment when mutating existing memory accesses.

Important

If whole-function vectorization is enabled, this means that vectorized access shall maintain the alignment of the original accesses. Therefore this pass may introduce vector accesses whose alignment is smaller than their size in bytes.

For example, when vectorizing by a factor of 8, i8 align 1 may be vectorized to <8 x i8> align 1 and <2 x i16> align 4 may be vectorized to <16 x i16> align 4.

All new memory accesses created by ComputeMux compiler passes shall use the alignment specified by the target’s data layout string contained in the LLVM IR module and so shall be correctly aligned for the target architecture.

User control over alignment

link

Users of OpenCL can additionally explicitly specify minimum alignment on enum, struct and union types using the aligned attribute. This attribute is optional and may be supported in a target-specific way as part of conformant ComputeMux implementation.

Debug Info

link

ComputeMux Compiler expects standard LLVM IR debug metadata to be used as the format for source debug information. The reusable passes provided by oneAPI Construction Kit to ComputeMux Compiler targets make a best-effort attempt to preserve debug info, but no guarantees are provided.

Tip

In OpenCL, the -cl-opt-disable Compilation Option can be used by developers to disable optimizations for a better debugging experience. The default is optimizations are enabled. oneAPI Construction Kit uses this flag to skip front-end compiler transformations used for performance, but places no requirements on the ComputeMux Compiler implementation to act on the flag.

LLVM debug information is designed to be agnostic regarding the final format and target debugger. oneAPI Construction Kit does nothing to compromise this, and it is at the discretion of the ComputeMux Compiler back-end to choose the most suitable output format for the target, e.g. DWARF, Stabs, etc.

Debug information metadata is not used for any other purpose in the oneAPI Construction Kit and may be discarded by a ComputeMux Compiler target without sacrificing either correctness or performance.

In the future, the ComputeMux Compiler specification may define functions the target can optionally implement for the purposes of debugging. For example, with target specific debugger hooks.

Note

OpenCL 1.2 does not provide a Compilation Option to developers to enable debug information in the kernel. Instead, oneAPI Construction Kit provides the cl_codeplay_extra_build_options OpenCL extension which introduces the following options (amongst others) to aid debugging:

-g: Build program with debug info.
-S <path/to/source/file>: Point debug information to a source file on disk. If this does not exist, the runtime creates the file with cached source.

These options make use of existing LLVM debug info metadata, and place no additional responsibilities on the ComputeMux Compiler target.

DMA

link

The ComputeMux compiler specification defines several DMA builtins that a compiler implementation should provide in the form of an IR pass or library. A target that does not provide definitions of the DMA builtins cannot take advantage of the optimizations described below.

Defining these builtins using platform specific DMA features enables optimized memory operations in any frameworks built on top of Mux.

For targets unable to support hardware DMA oneAPI Construction Kit provides software implementations of the DMA builtins in the form of compiler passes that any target may use. Software implementations of the DMA builtins may have a performance overhead and any target that can provide platform optimized implementations of the builtins should do so.

A full list of the DMA builtins along with their signatures and semantics can be found in the Builtins section of the ComputeMux compiler specification.

Atomics and Fences

link

Atomic and fence instructions shall be emitted into the IR consumed by a ComputeMux compiler implementation. As outlined in the Atomics and Fences section, if a target has hardware support for atomic operations it should map them to these instructions. As a fallback, if a target does not support hardware atomics or fences it may implement these instructions in software using synchronization primitives such as mutexes.

For a full list of the atomic and fence instructions a ComputeMux compiler implementation must handle see the Atomics and Fences section of the CompilerMux specification.

The required set of instructions allows the oneAPI Construction Kit to support the OpenCL C atomic and OpenCL C fence operations and the SPIR-V atomic and SPIR-V barrier operations. Synchronization on non-atomic memory access is defined by a memory consistency model. The memory consistency requirements made on the instruction listed in the Mux compiler spec enables the oneAPI Construction Kit to support the higher level OpenCL memory consistency model and the Vulkan memory model.

Barriers

link

The ComputeMux compiler specification defines a set of barrier builtins which provide a limited ability to synchronize between kernel execution threads. These are designed to support the barrier functions defined in the OpenCL C specification. A full list of these builtins as we define them, and a brief description of their semantics can be found in the Builtins section of the ComputeMux compiler specification.

ComputeMux provides a compiler pass that transforms kernels containing barriers such that execution and memory dependencies created by them can be satisfied without the need for synchronization primitives on the device. This pass makes use of the fence instructions described in the section above, so barrier support can benefit from hardware support for such operations. Instead of relying on the compiler pass, a ComputeMux implementation may choose to implement these builtins with supporting hardware features, as mentioned in the Synchronization Requirements section.

Builtins

link

The ComputeMux compiler has the notion of builtin functions. These are functions that are known by the compiler to exhibit certain semantics and properties which are useful or essential for the purposes of compilation. These always include the functions defined by Abacus. In addition – depending on the higher-level language being compiled for – other builtin functions are recognized; for example, for OpenCL and SYCL, the OpenCL Builtin Functions are considered by ComputeMux to be builtin.

The definitions of builtins shall be assumed to be provided by ComputeMux. ComputeMux may modify the implementation of a builtin function at any point in the compilation pipeline according to its own needs. It may do so regardless of whether that builtin is already defined in the module being compiled.

Tip

If users wish to provide their own implementations of builtin functions, they should do so using a new function definition which is not recognized by ComputeMux as a builtin. For example, an optimized fma may safely be implemented as my_fma, provided users replace all calls to fma to the new function.

Rate this Guide

ComputeMux Compiler

Supported LLVM Versions

assignmentJump to Section

Intermediate Representation
Types
Integer and Floating-point Operations
Intrinsics
Address Spaces
Alignment
Debug Info
DMA
Atomics and Fences
Barriers
Builtins

oneAPI Menu

Main Menu

Products

menu_bookGuides

Intermediate Representation

Types

Other types

Optional 64-bit integer support

Optional 16-bit half support

Optional 64-bit double support

Vector Types and Whole-Function Vectorization

Integer and Floating-point Operations

Floating-point Precision Requirements

OpenCL Conformance

Intrinsics

Address Spaces

Alignment

ComputeMux

User control over alignment

Debug Info

DMA

Atomics and Fences

Barriers

Builtins

ComputeMux Compiler

Supported LLVM Versions

assignmentJump to Section

Select a Product

oneAPI

Dark Mode

Light Mode

Also,

part of our network

Codeplay.com

SYCL.tech

Codeplay Developer

Codeplay Open Source