info

Please note that you are viewing a guide targeting an older version of ComputeCpp™ Community Edition. This guide was designed for version 2.10.0.

Options for Building Eigen

link

Options for Building Eigen SYCL

The Eigen project has a SYCL back-end that implements the "tensor" operations and runs on various devices. The SYCL back-end has been merged to upstream as of this commit. The supported devices range from desktop CPUs, GPUs through to embedded accelerators such as the ARM Mali GPU. Each device has different capabilities and in particular the embedded devices have different memory models and restricted memory sizes.

If you would like to see some code that uses Eigen tensors using SYCL, take a look at the tensor benchmark code.

In order to tune the Eigen expressions for these devices and make the most of the hardware's capabilities, various compile time flags can be passed to the CMake when compiling your code to specialize the Eigen expressions for a particular device. These flags are outlined on this page.

They can be passed to CMake using -Dmacroname or in the source code using a #define.

link

Running the CMake Command to Build the SYCL Tests for Eigen

The SYCL Eigen implementation includes a set of tests that can be run and the following commands will build these tests so that they can be executed.

Clone the repository and create a build folder with the following commands:

git clone https://gitlab.com/libeigen/eigen.git
cd eigen
# If the default branch does not work anymore you can try an older commit with:
# git checkout 956131d0e648b468a51ac6e481b31151b132468e
mkdir build
cd build
cmake -DEIGEN_TEST_CXX11=1 -DEIGEN_TEST_SYCL=1 -DComputeCpp_DIR="PathToComputeCppRootDirectory" -DCMAKE_CXX_FLAGS="-O3 -DEIGEN_SYCL_ASYNC_EXECUTION=1" -DCOMPUTECPP_USER_FLAGS="-O3 -sycl-compress-name -fsycl-split-modules=20" ../

Depending on your configuration CMake may not be able to find your OpenCL implementation in which case it can be provided with -DOpenCL_LIBRARY="/path/to/libOpenCL.so". As well as the above, -DEIGEN_HAS_CXX11_MATH=1 should be defined on Arm platforms, it can be given in CMAKE_CXX_FLAGS. Note that -DEIGEN_USE_SYCL=1 is here implicitly defined because of -DEIGEN_TEST_SYCL=1. Any applications using Eigen with SYCL should define EIGEN_USE_SYCL=1 before including the headers.

You can compile and run the SYCL tests using the script in the build folder:

EIGEN_MAKE_ARGS="-j4" ./check.sh sycl

link

Cross-compiling Eigen SYCL

You can cross-compile Eigen SYCL by giving CMake a toolchain file setting up your cross compilation toolchain:

-DCMAKE_TOOLCHAIN_FILE=<path/to/toolchain.cmake>

And point the Eigen build to an OpenCL library for the target, as well as paths to a native ComputeCpp package as well as a target ComputeCpp package:

-DComputeCpp_HOST_DIR=<path/to/native/ComputeCpp>
-DComputeCpp_DIR=<path/to/target/ComputeCpp>
-DOpenCL_LIBRARY=<path/to/target/OpenCL>
-DOpenCL_INCLUDE=<path/to/OpenCL/headers>

You also need to set:

-DCOMPUTECPP_USE_COMPILER_DRIVER=0

To make sure you use two step SYCL compilation so that compute++ isn't used where the cross-compiler should be used.

As well as COMPUTECPP_BITCODE to match your device:

-DCOMPUTECPP_BITCODE="spir"   # for 32 bit target
-DCOMPUTECPP_BITCODE="spir64" # for 64 bit target

CMake may also ask you to provide variables such as:

-DTHREADS_PTHREAD_ARG=1

This is because when cross-compiling CMake can't run generated binaries to make sure certain libraries and options are available, so we need to inform CMake whether these features are available or not.

link

Enable SYCL in your application

If your application is using Eigen you can enable the SYCL back-end by defining the following macro before including the headers:

#define EIGEN_USE_SYCL 1

link

Asynchronous Execution

Asynchronous execution allows to enqueue multiple kernels without waiting for their execution to finish. This should be enabled by default.

#define EIGEN_SYCL_ASYNC_EXECUTION 1

link

Local Work Group Size

Different devices might have different work group sizes, and the following can be set to accommodate different hardware configurations and architectures.

#define EIGEN_SYCL_LOCAL_THREAD_DIM0 16
#define EIGEN_SYCL_LOCAL_THREAD_DIM1 16

These are the static value for itemID.get_local(0) and itemID.get_local(1) respectively. They must be a power of 2. It is advised to set the same values for EIGEN_SYCL_LOCAL_THREAD_DIM0 and EIGEN_SYCL_LOCAL_THREAD_DIM1.

If the device has resource constraints you should restrict the maximum number of threads. This is important for certain processors, for example Intel CPUs and GPUs, and Arm devices. This macro is used for TensorScanOp on GPU for SYCL back-end. The following default option is equivalent of 1024 blocks on AMD which has 256 work group size.

#define EIGEN_SYCL_MAX_GLOBAL_RANGE 262144

link

Local Memory Usage

It might be necessary to disable local memory on specific hardware that does not have that type of memory. The macros below can be used to enable and disable local memory. If both are defined or neither are defined, the decision for using the local memory will be made at runtime. The default is to use local memory in order to improve performance when there is a dedicated shared memory.

#define EIGEN_SYCL_LOCAL_MEM 1

On devices such as Arm GPUs that do not have dedicated shared memory, local memory is used to mimic global memory so it is best to disable the EIGEN_SYCL_LOCAL_MEM macro and enable the EIGEN_SYCL_NO_LOCAL_MEM macro.

#define EIGEN_SYCL_NO_LOCAL_MEM 1

link

Register Usage

If the device you are using has limitations on registers, you can define the size of the registers. If there are no limitations the launcher tries to choose the optimized number of registers.

#define EIGEN_SYCL_REG_M [YOUR SIZE POWER of 2]
#define EIGEN_SYCL_REG_N [YOUR SIZE POWER of 2]

link

Barrier Usage

The following macro gives a performance boost on the ARM Mali GPU by disabling some barriers.

#define EIGEN_SYCL_DISABLE_ARM_GPU_CACHE_OPTIMISATION 1

link

Device selection

By default when running the tests, they will run on all available SYCL devices. Compiling with the EIGEN_SYCL_USE_DEFAULT_SELECTOR macro will instead select the preferred device using cl::sycl::default_selector that can be controlled at runtime by the environment variable COMPUTECPP_TARGET. (Note: COMPUTECPP_TARGET is a ComputeCpp specific macro and is not available in other SYCL implementations)

#define EIGEN_SYCL_USE_DEFAULT_SELECTOR 1

link

Eigen Vectorization

The following flags can be used for enabling and disabling Eigen vectorization. By default Eigen vectorization is enabled however, if the device does not support vectorization or you are compiling the code for OpenCL CPU devices you should pass both the above flags in compute++ and CXX compilers to disable vectorization as Compute Aorta has its own vectorization system. Remember that either both flags should be set or none. If you set one of them and don't set the other one you may get inconsistent kernel generated for host and device.

#define EIGEN_DONT_VECTORIZE_SYCL 1
#define EIGEN_DONT_VECTORIZE 1

link

Use Program Class

In some cases it can be useful to make Eigen use the cl::sycl::program class, this can be done by enabling the following macro.

#define EIGEN_SYCL_USE_PROGRAM_CLASS 1

link

Enable Exceptions

By default exceptions are disabled. Enabling them will change how SYCL errors are handled and will use the wait_and_throw variant instead of a wait. It is recommended to enable it when an error occurs.

#define EIGEN_EXCEPTIONS 1

link

Do not reuse buffers

By default Eigen will reuse any temporary SYCL buffers created in some operations. You can disable this feature to not reuse the buffers if some other mechanism already reuses the buffers or if the maximum memory available is too limited.

#define EIGEN_SYCL_NO_REUSE_BUFFERS 1

link

Store latest event

The Eigen API does not have a mechanism to return the event of an asynchronous operation. Enabling this option provides a workaround that allows the user to get the latest event that was created using the SyclDevice. This is an experimental feature, enabling this option causes extra locks to happen.

#define EIGEN_SYCL_STORE_LATEST_EVENT 1

Rate this Guide

Eigen Guide

CE_LICENSE

assignmentJump to Section

Options for Building Eigen
Options for Building Eigen SYCL
Running the CMake Command to Build the SYCL Tests for Eigen
Cross-compiling Eigen SYCL
Enable SYCL in your application
Asynchronous Execution
Local Work Group Size
Local Memory Usage
Register Usage
Barrier Usage
Device selection
Eigen Vectorization
Use Program Class
Enable Exceptions
Do not reuse buffers
Store latest event

ComputeCpp™ Menu

Main Menu

Products

menu_bookGuides

Options for Building Eigen

Options for Building Eigen SYCL

Running the CMake Command to Build the SYCL Tests for Eigen

Cross-compiling Eigen SYCL

Enable SYCL in your application

Asynchronous Execution

Local Work Group Size

Local Memory Usage

Register Usage

Barrier Usage

Device selection

Eigen Vectorization

Use Program Class

Enable Exceptions

Do not reuse buffers

Store latest event

Eigen Guide

CE_LICENSE

assignmentJump to Section

Select a Product

oneAPI

Dark Mode

Light Mode

Also,

part of our network

Codeplay.com

SYCL.tech

Codeplay Developer

Codeplay Open Source