This tutorial is intended to be based on the target created in the create new target tutorial.
Note
If the user does not want to do the whole tutorial the json
file
refsi_with_wrapper.json
can be used in the target creation. Also if the user
wishes to see the result of this tutorial, “;clmul” can be added to the feature
json
entry.
It is common that a target’s hardware has specialized instructions which accelerate operations suited to the platform and its intended domain.
Often, these instructions can be successfully matched from user code automatically, through the compiler backend identifying and optimizing sequences of basic operations.
Occasionally, these instructions are so specialized that compilers are unable to match them automatically. In this situation it is common that vendors provide builtin functions to users for use in their programs. These builtins are then caught and matched to specialized instructions by compilers.
This tutorial shows an example of such a scenario using RISC-V’s carry-less
multiplication (Zbc) extension. It will expose
the carry-less multiplication (clmul*
) instructions to users via builtin
functions for use in OpenCL programs.
Registering A New Compiler Extension
Registering a new compiler extension in ComputeMux requires the following files:
compiler/refsi_tutorial/extension/riscv_clmul/
├─ CMakeLists.txt
├─ include/
│ ├─ extension/
│ │ ├─ riscv_clmul.h
│ │ ├─ builtins.h
├─ source/
│ ├─ riscv_clmul.cpp
Note that this requires the new directory extension to have a simple CMakeLists.txt as follows:
add_subdirectory(riscv_clmul)
The extension subdirectory has to be added to
compiler/refsi_tutorial/CMakeLists.txt
before set(REFSI_TUTORIAL_SOURCES
:
add_subdirectory(extension)
# add before this point
set(REFSI_TUTORIAL_SOURCES
The main header and source files provide the OpenCL runtime with knowledge of the extension when registered:
riscv_clmul.h
:
#ifndef EXTENSION_RISCV_CLMUL_INCLUDED
#define EXTENSION_RISCV_CLMUL_INCLUDED
#include <extension/extension.h>
namespace extension {
class riscv_clmul final : public extension {
public:
riscv_clmul();
};
} // namespace extension
#endif // EXTENSION_RISCV_CLMUL_INCLUDED
riscv_clmul.cpp
:
#include <extension/riscv_clmul.h>
extension::riscv_clmul::riscv_clmul()
: extension("cl_riscv_clmul",
usage_category::DEVICE CA_CL_EXT_VERSION(1, 0, 0)) {}
The CMake code is used to register a compiler extension with these source files and register a force-include header file which is implicitly included in all OpenCL programs when the extension is enabled.
CMakeLists.txt
:
add_ca_cl_compiler_extension(riscv-clmul
EXTENSIONS riscv_clmul
INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/include
SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/include/extension/riscv_clmul.h
${CMAKE_CURRENT_SOURCE_DIR}/source/riscv_clmul.cpp)
add_ca_force_header(PREFIX "riscv_clmul"
DEVICE_NAME "${CA_REFSI_TUTORIAL_DEVICE}"
PATH "${CMAKE_CURRENT_SOURCE_DIR}/include/extension/builtins.h")
The force-include header itself provides declarations of the builtin functions
to expose to users. This tutorial covers adding three builtins which map 1:1
with their corresponding RISC-V instructions, assuming a 64-bit RISC-V platform
(i.e., mapping the 64-bit long
directly to the native 64-bit register
length without type promotion).
builtins.h
:
// Produce the lower half of the 128-bit carry-less product
__attribute__((overloadable)) long clmul(long, long);
// Produce the upper half of the 128-bit carry-less product
__attribute__((overloadable)) long clmulh(long, long);
// Produces bits 126-63 of the 128-bit carry-less product
__attribute__((overloadable)) long clmulr(long, long);
Note
Builtins must be given the overloadable attribute as OpenCL functions are internally required to be mangled according to their type signatures.
With the extension registered, it is now possible for users to use these builtins in programs via the implicitly-included header:
kernel void do_clmul(global long *a, global long *b, global long *z) {
size_t id = get_global_id(0);
z[id] = clmul(a[id], b[id]);
}
However, the builtins have been provided only as declarations so the compiler is required to provide definitions for the builtins. Without this step, the program fails to build as it contains unresolved symbols. The next step is to modify the compiler to translate the builtin functions to LLVM intrinsic functions which finish the implementation.
Note
Builtins can be given default software implementations, with the compiler optimizing to more optimal forms given certain conditions, but that is not within the scope of the tutorial.
Builtin Replacement
With the clmul*
builtins registered via the extension and successfully
entering the LLVM IR module via the compiler frontend, the next step is to
ensure that the builtins are lowered to the appropriate instructions.
This tutorial lowers builtins to RISC-V instructions via LLVM intrinsics as
the open-source LLVM 13+ compiler provides intrinsics which map 1:1 with
clmul*
instructions. The task, therefore, is to replace calls to clmul*
builtin functions in LLVM IR with calls to llvm.riscv.clmul*
intrinsic
functions.
ComputeMux makes this task straightforward; targets can hook in custom logic to replace builtin functions with more optimal sequences of LLVM IR.
The OptimalBuiltinReplacementPass is a utility pass that
runs over a module, identifying call instructions and replacing them with new
IR sequences. This pass will run by default in the refsi_tutorial
.
Alternatively the pass can be run at any point by targets with their own
customized pass pipelines. Part of the information it uses for the replacements
is defined by a BuiltinInfo
class.
The pass needs more information to replace clmul*
instructions. This can be
done by creating a custom inherited class from CLBuiltinInfo
. CLBuiltinInfo
is a class that encapsulates information and transformations concerning
compiler OpenCL builtin functions.
First we define the class, which needs to define two methods, analyzeBuiltin
and emitBuiltinInline
as follows after the headers in module.cpp
:
#include <llvm/IR/IntrinsicsRISCV.h>
#include <compiler/utils/mangling.h>
namespace refsi_tutorial {
class CLTargetBuiltinInfo : public compiler::utils::CLBuiltinInfo {
public:
// Will be filled out by each example
CLTargetBuiltinInfo(std::unique_ptr<compiler::utils::CLBuiltinLoader> L)
: compiler::utils::CLBuiltinInfo(std::move(L)) {}
compiler::utils::Builtin analyzeBuiltin(
llvm::Function const &Builtin) const override {
}
llvm::Value *emitBuiltinInline(
llvm::Function *Builtin, llvm::IRBuilder<> &B,
llvm::ArrayRef<llvm::Value *> Args) override {
}
};
}
We will fill in the bodies of these function later in the tutorial.
Next we need to update createPassMachinery
to register our
CLTargetBuiltinInfo
rather than the default one. This is done via setting a
callback for the PassMachinery
class which is used to register an analysis pass
which uses a BuiltinInfo
class as a basis.
In createPassMachinery
we replace the lines:
auto Callback = [Builtins](const llvm::Module &) {
return compiler::utils::BuiltinInfo(utils::createSimpleCLBuiltinInfo(Builtins));
};
with
auto Callback = [Builtins](const llvm::Module &) {
return compiler::utils::BuiltinInfo(createSimpleTargetCLBuiltinInfo(Builtins));
};
This requires createSimpleTargetCLBuiltinInfo
to be implemented. This
calls std::make_unique
to create a unique pointer to a newly created
BILangInfoConcept templated with the new class and passes it a builtin loader.
This can be written as follows:
std::unique_ptr<compiler::utils::BILangInfoConcept> createSimpleTargetCLBuiltinInfo(
llvm::Module *Builtins) {
return std::make_unique<CLTargetBuiltinInfo>(
std::make_unique<compiler::utils::SimpleCLBuiltinLoader>(Builtins));
At this point everything is plumbed in, but the class will do nothing different
to CLBuiltinInfo
. There are two parts to this. Firstly we need to implement
the analyzeBuiltin
. This has to return property information about the
builtin. In this case we want to say that it can be emitted inline. We also set
the no side effects
property to assist optimization. This also demangles the
names of the builtins to match. Note that if we are not declaring any additional
BuiltinID values, we can use eBuiltinUnknown
. Replace analyzeBuiltin
method with the following:
compiler::utils::NameMangler mangler(&Builtin.getParent()->getContext());
llvm::StringRef BaseName = mangler.demangleName(Builtin.getName());
if ((BaseName == "clmul") || (BaseName == "clmulh") ||
(BaseName == "clmulr")) {
unsigned Properties = compiler::utils::eBuiltinPropertyCanEmitInline |
compiler::utils::eBuiltinPropertyNoSideEffects;
return (compiler::utils::BuiltinProperties)Properties;
return compiler::utils::Builtin{
Builtin, compiler::utils::eBuiltinUnknown,
(compiler::utils::BuiltinProperties)Properties};
}
compiler::utils::CLBuiltinInfo::analyzeBuiltin(Builtin);
With this in place we update emitBuiltinInline
by identifying calls to
clmul
, clmulh
and clmulr
and replaces them with the equivalent LLVM
intrinsic.
if (Builtin) {
compiler::utils::NameMangler mangler(&Builtin->getParent()->getContext());
llvm::StringRef BaseName = mangler.demangleName(Builtin->getName());
if (BaseName == "clmul") {
return B.CreateIntrinsic(llvm::Intrinsic::riscv_clmul,
Builtin->getReturnType(), {Args[0], Args[1]});
} else if (BaseName == "clmulh") {
return B.CreateIntrinsic(llvm::Intrinsic::riscv_clmulh,
Builtin->getReturnType(), {Args[0], Args[1]});
} else if (BaseName == "clmulr") {
return B.CreateIntrinsic(llvm::Intrinsic::riscv_clmulr,
Builtin->getReturnType(), {Args[0], Args[1]});
}
}
return compiler::utils::CLBuiltinInfo::emitBuiltinInline(Builtin, B, Args);
Note
For simplicity, no validation or type checking is performed here.
Final Result
With the above step completed, users can compile OpenCL programs using
clmul*
builtins that are optimally mapped directly to RISC-V instructions.
This can be done with the standalone compiler clc (which can be built with
ninja clc
).
clmul.cl
:
kernel void do_clmul(global long *a, global long *b, global long *z) {
size_t id = get_global_id(0);
z[id] = clmul(a[id], b[id]);
}
clc --strip-binary-header -o clmul.o clmul.cl -cl-wfv=never
llvm-objdump --disassemble --triple=riscv64 --mattr="v,c,zbc" clmul.o | grep clmul
0000000000010000 <do_clmul>:
1003c: 33 96 c6 0a clmul a3, a5, a3
Lit Testing
Finally this can also be tested using a lit test, which is a common way of
testing passes in LLVM. The tool muxc
can be used for this purpose in a similar
way to how the LLVM tool opt
can be used. Create the following file
compiler/refsi_tutorial/test/clmul_replace.ll
:
; RUN: %muxc --device "RefSi M1 Tutorial" %s --passes "require<builtin-info>,optimal-builtin-replace,verify" -S | FileCheck %s
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"
target triple = "riscv64-unknown-unknown-elf"
; CHECK: call i64 @llvm.riscv.clmul.i64
; CHECK: call i64 @llvm.riscv.clmulh.i64
; CHECK: call i64 @llvm.riscv.clmulr.i64
; Function Attrs: norecurse nounwind
define spir_kernel void @do_clmul(ptr addrspace(1) %a, ptr addrspace(1) %b, ptr addrspace(1) %z) {
entry:
%0 = load i64, ptr addrspace(1) %a
%1 = load i64, ptr addrspace(1) %b
%call = tail call spir_func i64 @_Z5clmulll(i64 %0, i64 %1)
%call4 = tail call spir_func i64 @_Z6clmulhll(i64 %0, i64 %1)
%add = add nsw i64 %call4, %call
%call7 = tail call spir_func i64 @_Z6clmulrll(i64 %0, i64 %1)
%add8 = add nsw i64 %add, %call7
store i64 %add8, ptr addrspace(1) %z
ret void
}
declare spir_func i64 @_Z5clmulll(i64, i64)
declare spir_func i64 @_Z6clmulhll(i64, i64)
declare spir_func i64 @_Z6clmulrll(i64, i64)
muxc
is used here to run the the builtin-info
analysis pass and the
optimal-builtin-replace
pass on the input IR. This results in the output of the
intrinsics llvm.riscv.clmul*
.
All of the lit tests for the refsi-tutorial
target can be run with building the
check-refsi-tutorial-lit
target. If you use the create script for building new
targets the target name will be used instead of refsi-tutorial
.