Now that we are using the RefSi driver to open a connection to a device, more useful HAL operations can be implemented, such as allocating and freeing memory, as well as copying data between host and device memory.
The most straightforward of these operations are allocating and freeing device
memory. This is because there is almost a 1:1 mapping between driver functions
and HAL operations. These functions operate like malloc
and free
,
except that the allocated memory lives on the device:
// refsi_hal.cpp
hal::hal_addr_t refsi_hal_device::mem_alloc(hal::hal_size_t size,
hal::hal_size_t alignment) {
refsi_locker locker(hal_lock);
return refsiAllocDeviceMemory(device, size, alignment, DRAM);
}
bool refsi_hal_device::mem_free(hal::hal_addr_t addr) {
refsi_locker locker(hal_lock);
return refsiFreeDeviceMemory(device, addr) == refsi_success;
}
Implementing memory transfer operations is slightly more complicated. It
involves mapping the address in device memory to an address (pointer) which can
be used directly by the host. Once this is done, memcpy
can be used to copy
data between host and device:
// refsi_hal.cpp
#include <string.h> // Added
...
bool refsi_hal_device::mem_read(void *dst, hal::hal_addr_t src,
hal::hal_size_t size) {
refsi_locker locker(hal_lock);
void *src_mem = refsiGetMappedAddress(device, src, size);
if (!src_mem) {
return false;
}
memcpy(dst, src_mem, size);
return true;
}
bool refsi_hal_device::mem_write(hal::hal_addr_t dst, const void *src,
hal::hal_size_t size) {
refsi_locker locker(hal_lock);
void *dst_mem = refsiGetMappedAddress(device, dst, size);
if (!dst_mem) {
return false;
}
memcpy(dst_mem, src, size);
return true;
}
As previously mentioned, hal_addr_t
hold addresses that are opaque to the
host CPU and cannot be cast to a pointer for dereferencing. What
refsiGetMappedAddress
does is return a pointer to an area of memory which is
accessible to the CPU and where memory accesses are mirrored on the
corresponding area of device memory. This means writes by the CPU are
automatically seen by the device when it reads the memory, and similarly the CPU
will observe changes made by the device when it reads from the mapped memory
area.
Note
The RefSi platform works in such a way that the entire device DRAM is always mapped in the host CPU’s address space. As a result, the user of the RefSi driver (e.g. the RefSi HAL) does not need to manually unmap memory regions.
At this point, running copy_buffer
results in the example finishing
successfully:
$ bin/copy_buffer
Using device 'RefSi M1 Tutorial'
Results validated successfully.
Running the hello
and vector_add
examples results in the same error we
have seen previously. We will look at how to address this issue in the next
section.
$ bin/hello
Using device 'RefSi M1 Tutorial'
Running hello example (Global size: 8, local size: 1)
Unable to create a program from the kernel binary.
$ bin/vector_add
Using device 'RefSi M1 Tutorial'
Running vector_add example (Global size: 1024, local size: 16)
Unable to create a program from the kernel binary.