clc
OpenCL Offline Kernel Compiler
clc
is a command-line tool that allows compiling OpenCL C, SPIR and SPIR-V
kernels to an implementation-defined binary format, it does not tie in to any
particular Mux target implementation.
clc
Use-Cases
Checking a kernel for compile errors without outputting a binary:
clc -n kernel.cl
Compile a kernel to a file:
clc -o clbin/kernel.bin src/kernel.cl
Choose the oneAPI Construction Kit’s OpenCL implementation on a multi-device system:
clc -dn "host CPU" -o clbin/kernel.bin src/kernel.cl
Compile a kernel with a given include path and definitions:
clc -Ilib1/cl -Ilib2/cl -DUSE_FEATURE1 -DANSWER=42 -o kernel.bin kernel.cl
Save the target-generated binary without the CL-specific binary header for inspection: (it will not be possible to load it using the OpenCL API)
clc --strip-binary-header -o kernel_target.bin src/kernel.cl
clc
Usage
Command-line usage:
clc [options] [--] [file1.h file2.h inputfile.cl OR spirfile.bc OR spirvfile.spv]
The input files will be concatenated in the order specified and passed to clBuildProgram.
Supported options:
optional arguments:
-h, --help show this message and exit
--version show program's version number and exit
-v, --verbose show more information during execution
-n, --no-output suppresses generation of the output file, but runs the
rest of the compilation process
-o file, --output file
output file path, defaults to the name of the last
input file "<input>.bin" if present, or "-" otherwise
to write to standard output
-d name, --device name
a substring of the device name to select
--list-devices print the list of available devices and exit
-X opt passes an option directly to the OpenCL compiler
optional preprocessor arguments:
-D name predefine name as a macro, with definition 1
-D name=definition the contents of definition are tokenized and processed
as if they appeared during translation phase three in a
`#define' directive. In particular, the definition will
be truncated by embedded newline characters
-I dir adds a directory to the list to be searched for headers
optional math intrinsics arguments:
-cl-single-precision-constant
treat double precision floating-point constants as
single precision constants
-cl-denorms-are-zero allows flushes of denormalized numbers to zero for
optimization
optional optimization arguments:
-cl-opt-disable this option disables all optimizations
-cl-mad-enable allow a * b + c to be replaced by a mad with reduced
accuracy
-cl-no-signed-zeros allow optimizations for floating-point arithmetic that
ignore the signedness of zero
-cl-unsafe-math-optimizations
allow optimizations for floating-point arithmetic that
may violate IEEE 754
-cl-finite-math-only allow optimizations for floating-point arithmetic that
assume that arguments and results are not NaNs or
+/-inf
-cl-fast-relaxed-math sets -cl-finite-math-only and
-cl-unsafe-math-optimizations
optional additional arguments:
-w disables the OpenCL warnings
-Werror makes the OpenCL warnings into errors
-cl-std={CL1.1,CL1.2} determine the OpenCL C language version to use
-cl-kernel-arg-info this option allows the compiler to store
information for clGetKernelArgInfo
optional SPIR extended arguments:
-spir-std=1.2 chooses the version of SPIR standard to follow
(defaults to 1.2 if SPIR input detected)
-x spir indicates the input is in SPIR format (added
automatically if SPIR input detected)
optional oneAPI Construction Kit extended arguments:
--strip-binary-header strips the header containing argument and kernel count
information, leaving only the binary directly from the
target implementation WARNING: The output binary cannot
be loaded by the oneAPI Construction Kit runtime again!
-codeplay-soft-math inhibit use of LLVM intrinsics for mathematical builtins
-g enables generation of debug information, for best
results use in combination with -S
-S file path of the source file used for source locations when
-g is specified
-cl-llvm-stats enable reporting LLVM statistics
-cl-wfv={always,auto,never}
sets whole function vectorization mode
-cl-vec={none,loop,slp,all}
enables kernel early vectorization passes
Any other options not defined above will be passed to clBuildProgram unchanged.
Input and output files are both set to “-” by default to use standard
input/output. The default output path when only an input file was given is:
lastinfile.bin
.
oclc
OpenCL-C Compiler and Intermediate State Kernel Inspector
oclc
is a command-line tool that allows compiling OpenCL C kernels to LLVM IR
(.ll) or assembly (.s). Its primary purpose is to provide insight into the
operation of the compiler components of the oneAPI Construction Kit, whether to
aid debugging or the improve the quality of the generated code.
oclc
may also be used to execute OpenCL kernels with specified parameters, and
view parameter values post-execution.
Note that although this is an offline compile tool it hooks directly into the oneAPI Construction Kit OpenCL library, and thus any environment variables that affect the libraries behaviour will also affect
oclc
.
oclc
Use-Cases
To pass build flags through to the OpenCL compiler -cl-options
can be used,
this is particularly useful for enabling debug info:
oclc foo.cl -cl-options "-g -cl-opt-disable" -stage spir
If you want to see what the LLVM IR looks like before any optimizations or rewriting:
./oclc -stage frontend foo.cl > foo.ll
If you want to see the output from the vectorizer, in LLVM IR, before builtins are inlined or barriers are rewritten:
# The vectorizer does not run unless the `-cl-wfv=always` or `-cl-wfv=auto`
# option is provided.
./oclc -cl-options "-cl-wfv=always" -stage packetized foo.cl > foo.ll
If you want to see the assembly output for the auto-detected CPU, with vectorization (and inlining of builtins, rewriting of barriers, and creation of local work-item loops):
# The vectorizer does not run unless the `-cl-wfv=always` or `-cl-wfv=auto`
# option is provided.
./oclc -cl-options "-cl-wfv=always" -stage mc foo.cl > foo.S
oclc
Usage
./bin/oclc [options] <CL kernel file>
For textual outputs the result will be printed directly, however binary outputs
are saved to a file by default. These files will be named after the original
file path plus [.bc|.o] appended, e.g. foo.cl.bc
. To set a more convenient
output file manually use the -o
option which doesn’t add any file extension,
this can also be used for textual outputs.
When using oclc
to view generated LLVM IR/assembly, you may ignore all options
from -execute
downward. When using oclc
to execute a kernel, all kernel
arguments should be specified at least once using one of the -arg
, -print
,
-show
, or -compare
flags.
Program options:
-o <output_file> Set the output file.
-v Run oclc in verbose mode.
-format <output_format> Set the output file format.
-cl-options 'options...' OpenCL options to use when compiling the kernel.
-enqueue <kernel name> Enqueues a kernel
-execute Executes the enqueued kernel.
-seed <value> Set the seed of the random number engine used in rand() calls.
The seed is set to a default value if this is not set.
-arg <name>[,<width>[,<height>]],<list> Assigns a list value (as described below) to the
named argument when the kernel is executed.
If the argument is a 2D image, a width in pixels must be provided.
if the argument is a 3D image, a height in pixels must also be provided.
if the argument is an image, 4 values must be provided per pixel,
as images are treated as unsigned 8 bit RGBA arrays by default.
-arg <name>[,<width>[,<height>]],<list>:<filename> Assigns a list value (as described below), held in a
file, to the named argument when the kernel is executed.
-print <name>[,<offset>],<size> Prints a given number of elements from the given
named argument after execution to stdout, possibly
starting from some offset.
-print <name>[,<offset>],<size>:<filename> Prints a given number of elements from the given
named argument after execution to a file, possibly
starting from some offset.
-show <name>,<width>,[,<height>[,<depth>]][:<filename>] Prints the named image argument of the specified size to stdout,
or a file, if one is provided.
-compare <name>,<expected> Compares the named buffer to an expected list.
-compare <name>:<filename> Compares the named buffer to an expected list, held in a file.
-global <g1>,<g2>,... Sets the global work size to the given array of values.
-local <l1>,<l2>,... Sets the local work size to the given array of values.
-ulp-error <tolerance> Sets the maximum ULP error between the actual and target values accepted.
as a 'match' when -compare is applied to float or double values. Defaults to 0.
-char-error <tolerance> Sets the maximum difference between the actual and target values accepted
as a 'match' when -compare is applied to char or uchar values. Defaults to 0.
-repeat-execution <N> Executes the kernel N times. -global, -local, and -arg
arguments may be set to {<list>},{<list>},... to take on
different values on each execution.
Acceptable kernel argument values:
<list> ::= <el>
| <el> "," <list>
| <cl_bool> "," <cl_addressing_mode> "," <cl_filter_mode>" (for specifying sampler_t only)
<el> ::= <integer or decimal>
| "repeat(" <unsigned integer> "," <list> ")"
| "rand(" <decimal> "," <decimal> ")"
| "randint(" <decimal> "," <decimal> ")"
| "range(" <integer or decimal> "," <integer or decimal> ")"
| "range(" <integer or decimal> "," <integer or decimal> "," <integer or decimal> ")"
<cl_bool> ::= "CL_TRUE" | "CL_FALSE"
<cl_addressing_mode> ::= "CL_ADDRESS_NONE" | "CL_ADDRESS_CLAMP_TO_EDGE" | "CL_ADDRESS_CLAMP"
| "CL_ADDRESS_REPEAT" | "CL_ADDRESS_MIRRORED_REPEAT"
<cl_filter_mode> ::= "CL_FILTER_NEAREST" | "CL_FILTER_LINEAR"
Special kernel argument values:
repeat(N,list) creates a list containing `list` repeated `N` times
repeat(3,2,4) => 2,4,2,4,2,4
rand(min,max) creates a random floating point number in [min,max]
rand(1.2,4) => 3.195201 (potentially)
randint(min,max) creates a random integer number in [min,max]
randint(1,4) => 3 (potentially)
range(min,max,stride) produces a list beginning at `a`, moving in the direction of `b`
by `stride` units. if `stride` is not stated, it defaults to 1.
range(-4,21,5) => -4,1,6,11,16,21
Available format types (for use with -format):
text textual format such as LLVM IR or assembly
binary binary format such as LLVM BC or ELF
Note that the SPIR output in the above does not match the SPIR-1.2 specification, as that requires LLVM IR output from LLVM 3.2. It should not be assumed that other OpenCL implementations will be able to consume this SPIR, if you want to do that use the Khronos SPIR generator.