C Library Call Interception
When compiling for Host
, LLVM will sometimes insert calls to standard
library functions. This might include calls to memcpy()
when data is copied
between OpenCL address spaces, __chkstk()
on Windows to ensure enough stack
memory has been paged in, or __floatdidf()
on 32-Bit Arm to perform
division in software. In the case of an offline-compiled kernel, calls to these
functions are stored as relocations in the kernel’s ELF file, and the addresses
of the functions in memory are only filled in when the ELF file is loaded to be
executed. A useful side-effect of performing relocations is that it’s possible
to provide non-standard implementations of these functions to aid debugging.
Note
Calls to C library functions are quite rare in UnitCL, and their presence is fragile. It is difficult to predict whether a given LLVM option or pass will generate or remove a call to a library function. Intercepting C library calls is therefore only useful for debugging a small subset of bugs, but for those bugs it is invaluable.
The addresses of library functions are stored in relocs
in
modules/mux/targets/host/source/executable.cpp
. In Debug builds, calls to
memcpy()
and memset()
are intercepted by default and directed to
dbg_memcpy()
and dbg_memset()
. The debug versions of these functions
perform rudimentary out-of-bounds access checks.
Note
As of this writing (2020-03-31), no UnitCL kernel calls memset()
.
Tutorial: Inspecting a memcpy()
Call
memcpy()
calls normally end up calling an optimized, vectorized
implementation in libc. Even if you do manage to find debug symbols for it,
good luck figuring out what’s going on. In the oneAPI Construction Kit Debu
builds, offline-compiled kernels will call into dbg_memcpy()
instead.
dbg_memcpy()
can be inspected with GDB. For example:
gdb --args ./bin/UnitCL --gtest_filter=OfflineExecution*Regression_90*
(gdb) b dbg_memcpy
run
How to: Intercept a Function
The replacement standard library functions are in an anonymous namespace in
modules/mux/targets/host/source/executable.cpp
. Simply add the new function
to this namespace, and then add the function’s name and address to
relocs
.
Note
It is only possible to intercept functions that are called by offline
kernels. If an offline kernel calls a function for which a relocation does
not exist, then the kernel will seg fault. Consequently, if an offline kernel
can run, then all the library functions it needs are already in
relocs
, and those are the only functions that can be intercepted.
Warning
The compiler does not check that the intercepting function has the same signature as the function it’s replacing. Make sure that the signatures match.
Explanation
Offline-compiled kernel binaries contain relocations at function call sites.
A relocation is just a relative or absolute hardware jump instruction with a
blank target address. When the oneAPI Construction Kit loads the binary, it
writes the address of the function into the jump instruction (after possibly
doing some math onthe address). This is why relocs
stores function
addresses as uint64_t
types — the bytes of the address may just be written
directly into a binary executable.
Warning
Since library function call interception happens on the binary level of a compiled kernel, all of the normal protections offered by compilers are gone. There is no function prototype checking. The compiler is not able to check that the target address you have provided is even a valid function entry point. If somehow the OpenCL kernel and the oneAPI Construction Kit use a different calling convention, then you’re on your own.
Relocations are requested by function name by the ELF file stored inside an
offline-compiled kernel. Adding new entries into relocs
will not affect
existing kernels, because existing kernels do not request those relocations. If
a relocation is requested that does not exist in relocs
, then the ELF
loader will report and error.
On Arm32, the Host
target
(modules/compiler/targets/host/source/target.cpp
) enables various hardware
math features (e.g., features.push_back("+hwdiv");
). If these features are
not enabled, then LLVM will emit calls to library functions instead, and these
calls will then need to be added to relocs
in the same way as
__floatdidf()
. Intercepting these calls could potentially be a means of
debugging math functions on Arm32.
dbg_memcpy()
attempts to read all the source memory before copying it. Both
dbg_memcpy()
and dbg_memset()
attempt to zero out the destination
memory before writing data to it. The reads and writes provide a rudimentary
bounds checking; if either fails, then the oneAPI Construction Kit will abort
with a descriptive error.
Warning
dbg_memcpy()
and dbg_memset()
can only catch out-of-bounds reads and
writes that access memory outside of the oneAPI Construction Kit’s address space.
I.e., either function call must go horribly wrong before the illegal access is
caught. It is trivial for a kernel calling memcpy()
to completely clobber
the oneAPI Construction Kit’s owned memory, and dbg_memcpy()
cannot prevent
that.