Codeplay’s reference RISC-V target, for RISC-V implementation. The intention of this target is to provide a flexible way of communicating with a variety of customer RISC-V targets, with different RISC-V configurations. This target supports multiple different variants by using an abstract class (HAL), which is used to configure the target and act on commands such as enqueuing kernels and allocating and reading or writing to memory. The current version has only been tested with an x86_64 host CPU.
The current in-tree targets are variants of Codeplay’s reference architecture(RefSi). This comes in two variants G and M1. The riscv target matches G and has no need for anything architecture specific except for what is needed to support riscv. M1 has additional hardware features, such as DMA.
riscv target uses a common utility
riscv compiler library which can be
used or derived from for different targets.
The RISC-V target can also be built with just the compiler aspect changed. This
is shown with M1 under
examples/refsi/refsi_m1 of the oneAPI Construction Kit.
The HAL is an abstract class which is required to be used with this target. This
abstract class will be accessed through a shared library which will be opened at
runtime. This is done in
riscv::hal_get(), where it uses
dynamic loading. This provides a
hal_t class. From this
hal_t class, we
can get information about the general HAL target (
hal_info_t), more detailed
information about the devices (
hal_device_info_t) and create or free a HAL
For more detailed information on the HAL, see the specification and :doc:`dynamic loading </modules/mux/hal/dynamic_loading>.
From the target’s viewpoint we mostly interact with
hal_device_t for actions.
hal_device_t gives us the
A method to allocate memory and read and write to memory
A method to load a kernel (as an ELF file)
A method to enqueue a kernel across a range and a set of arguments.
All methods are currently seen as blocking (or effectively blocking). The HAL does not specify anything about the contents of the ELF file in itself, but the current compilation makes assumptions that the ELF file will have a certain interface to the arguments for each kernel function see RISC-V standard function arguments.
Is a base class which is not RISC-V specific, and there is a RISC-V specific one which is derived from this. The HAL is used in the following way:
hal_device_info_riscv_t provides information about the type of RISC-V
processor. This includes extensions and ABI information. This information is
modules/mux/targets/riscv/source/kernel.cpp to help it build a linked
ELF file for this particular processor configuration. At no other point does any
of the target reference anything specific about RISC-V.
hal_device_info_t is expected to give a lot of information which can be
used to populate oneAPI Construction Kit’s device information. This is not specific to
RISC-V. This includes information such as global memory size, address size etc.
specialized kernel is used to process the incoming arguments and create a
list of HAL arguments for enqueuing a kernel.
queue.cpp, enqueuing of a kernel across a range is done by using the
previously created list of HAL arguments, loading the kernel and calling the HAL
enqueue of NDRange.
queue.cpp, reading, writing and filling of buffers happen by calling the
equivalent method on the HAL.
memory.cpp, we support reading and writing of memory by calling the
equivalent function on the HAL.
The HAL is versioned with respect to any API changes, so if something changes in
the interface the version must too. The version in
expected_hal_version must match the HAL device. This may be as simple as a
recompile depending on the change.
RISC-V standard function arguments
The generated linked ELF file is expected to contain functions that have a defined format. For each kernel we have:
<function_name>(void *argsStruct, WorkGroupInfo *)
argsStruct is actually a block of memory which represents all of the
arguments. Each one is placed in order into the memory and is aligned to a power
of 2 greater than or equal to the size of the argument. For example if we have a
short, followed by a
uint would be 4 byte aligned and
start at 4 byte aligned offset and
short would be aligned to 2 bytes.
short8 would be aligned to 16 bytes.
The second argument tells us about the current workgroup that is to be acted on. The kernel function should work on the whole workgroup for each call to the kernel function.
For more information on this struct see the documentation in the HAL repository of oneAPI Construction Kit.
The standard RISC-V ABI is used currently, regardless of any HAL choices.
The information reported by a RISC-V device can vary depending on the build configuration of oneAPI Construction Kit. See the CMake Options for details on the effects of RISC-V specific CMake options.
Currently recommended build options include:
$ cmake -GNinja \
This will build a ‘G’ compatible version. To build a ‘M’ compatible version we
can keep the same
mux target, but use a different compiler target as the ‘M’
target has additional features. This is done by adding to the build options:
$ cmake -GNinja \
CA_EXTERNAL_MUX_COMPILER_DIRS tells us to also use an additional compiler
CA_MUX_COMPILERS_TO_ENABLE tells us to only enable this compiler
directory; this is needed to stop it also building the riscv target as well and
both being attached to the
The default HAL is
hal_refsi and it looks for it in
examples/refsi/hal_refsi. However if a directory
CA_RISCV_EXTERNAL_HAL_DIR is given it will look there. This will currently
CA_HAL_NAME to be set if the name differs from the default.
The installed LLVM must have RISCV as an enabled target and build
The following build options can also be useful:
Defines the default HAL which should be linked in. This will be used to link with the shared library, which should be of name
Is used to help the Mux target set up aspects which have to be done at build time. It can also be picked up by the HAL being built to configure the HAL if needed. These aspects include the 32/64 bit capabilities and floating point and double support. This is largely needed to create the abacus builtins. This string should match the RISC-V string which it is related to.
Disabled due to not supporting images but some prebuilt kernels not checking the support.
Is a bool (defaulted to true), which can be used to allow loading of a different HAL to the default at runtime, as described in the dynamic loading documentation in the oneAPI Construction Kit HAL repository.
Is a bool (defaulted to false), which can be used to set environment variables for debug purposes to demonstrate the execution of a kernel on RISC-V. Note for a Refsi M1 example build this will be CA_RISCV_M1_DEMO_MODE.
ICD support is optional.
The following environment variables are currently supported:
Used for setting the vectorization factor - see Compilation.
Allows overriding of the HAL to be used at runtime. Only supported if built with
-DCA_HAL_LOCK_DEVICE_NAME=OFF- see the dynamic loading documentation in the oneAPI Construction Kit HAL repository for more information.
Link builtins before the vectorizer is run if set to 1. This is particularly important for use with scalable vectorization for which the builtins do not create scalable vector equivalents. When scalable vectorization is enabled this will default to true, otherwise false.
Used to dump the generated IR at the beginning of the “late target passes” stage to stdout. Demo mode or debug mode only.
Additionally the following may be used by HALs to override their local setting, although this is not mandatory.
Sets the minimum reported minimum
VLENbits - see Compilation. This may override the VLEN if a HAL supports it. This should only be used if the actual VLEN used in the device is updated.
Path to elf file for dumping built executable. Demo mode or debug mode only.
If defined, output final assembly produced to stdout. Demo mode or debug mode only.
RISC-V can generate and accept binary executables, possibly containing multiple
kernels each. They use ELF files generated from LLVM. Both binaries and
compilation of source is managed in
executable.cpp. The contents of the
produced binaries are used in the various kernel classes, before finally being
loaded to the HAL in
riscvCreateExecutable() is used to either compile a bitcode file or use a
previously built binary to generate an executable. Builtin kernels are not
currently supported. For both cases we create a
riscv::binary_executable_data_s which is used to contain the ELF data in a
dynamic array. This is created as a shared pointer so it can be passed through
the various kernel types, rather than copying the data multiple times, as the
executable could be deleted before the kernels are.
If it is given bitcode, it passes to an upcasted riscv version of the
finalizer object, and calls
createBinaryFromSource() directly on it,
which is explained in more detail in Compilation.
The first stage of the kernel objects and just contains the shared executable and the kernel name.
The next stage and contains the local size as well as the shared executable.
The final stage and it is here that the global size as well as the kernel arguments are brought in. In
riscvCreateSpecializedKernel(), we process the descriptors passed in as parameters. These descriptors give information about each argument. These largely map one to one for each argument to equivalent
hal::hal_arg_t. In this function we create a vector of
hal_arg_tobjects and pass it to the created
riscv::specialized_kernel_s. This object also contains the global size of
hal_arg_tvalues can be created. This specialized kernel is later pushed onto the command queue in
riscvPushNDRange()and processed in
All actual compilation is done in the
finalizer class method
createBinaryFromSource(). The first thing we do is upcast the
hal_device_info_t and find out what extensions are supported in order to
initialize the target machine. We then read in the bitcode and turn it into an
LLVM Module. At this point we can run all the passes.
We also set
--riscv-v-vector-bits-min based on the hal_device_info_t value
vlen if it exists and is non-zero, and enable Vecz if
set (or vector flags are enabled at the OpenCL options level).
CA_RISCV_VF is defined as a comma separated list as follows:
S - Use scalable vectorization
V - Vectorize only, otherwise produce both scalar and vector kernels
A - Let Vecz automatically choose the vectorization factor
1-64 - Vectorization factor multiplier: the fixed amount itself, or the value that multiplies the scalable amount
All but one of the passes are util or LLVM passes. The util ones are detailed Compiler Utilities, but the basics are as follows:
riscv::IRToBuiltinReplacementPass- A bespoke pass to handle some IR which currently produces link errors. This currently only includes
fremand converts it a call to the
fmodbuiltin which is then handled by the abacus builtins.
llvm::InternalizePass- Used to help remove dead barrier calls after inlining
compiler::utils::AddKernelWrapperPass - Note that the use of this does not pack the args, but uses alignment to the power of 2 equal to or above the size of each argument
After running these passes all kernels should have the appropriate function signature of the argument structure and the schedule struct.
We then emit to a file and call LLD to link the final object. The
hal_device_info_t gives the linker script to use. At this point we have an
ELF file which will be untouched until it gets passed to the HAL to load.
riscv::command_group_s is used to maintain a vector of commands which are
later processed in
queue.cpp. This is identical to the Host CPU
code, except it does not support images and
host is renamed to
The riscv device maintains a threadpool. This is more complicated than it needs to be for our needs. Its main role here is to process the queued command and signal semaphores as needed when operations are done.
The main function of interest is
threadPoolProcessCommands(). This acts on
the command from the queue. This command can be one of the following:
command_type_copy_buffer- read, write, fill and copy map directly onto
command_type_reset_query_pool- These do not touch the HAL and use the query pool code in
query_pool.cpp, which is very similar to that of
exec_command_type_ndrange(), see below.
exec_command_type_ndrange() uses multiple
hal_device_t methods. It does
Loads the ELF file from the specialized kernel onto the device using
It finds the entry point of the kernel, using
It executes the kernel across the ndrange using