Options for Building Eigen

The Eigen project has a SYCL back-end implementation that implements operations such as tensors and runs on various devices. You can find the repository for the Eigen SYCL implementation here, and the additions are being up-streamed to the main project repository. The supported devices range from desktop CPUs, GPUs through to embedded accelerators such as the Renesas CVengine or ARM Mali GPU. Each device has different capabilities and in particular the embedded devices have different memory models and restricted memory sizes.

If you would like to see some code that uses Eigen tensors using SYCL, take a look at the tensor benchmark code.

In order to tune the Eigen expressions for these devices and make the most of the hardware's capabilities, various compile time flags can be passed to the CMake when compiling your code to specialize the Eigen expressions for a particular device. These flags are outlined on this page.

They can be passed to CMake using -Dmacroname or in the source code using a #define.

Running the CMake Command to Build the SYCL Tests for Eigen

The SYCL Eigen implementation includes a set of tests that can be run and the following commands will build these tests so that they can be executed.

Create a build folder in the root of Eigen and inside that directory run the following command.

cmake -DEIGEN_TEST_CXX11=1 -DEIGEN_TEST_SYCL=1 -DOpenCL_LIBRARY="PathToOpenCL.so" -DComputeCpp_DIR="PathToComputeCppRootDirectory" -DCMAKE_CXX_FLAGS="-O3" -DCOMPUTECPP_USER_FLAGS="-O3 -sycl-compress-name -fsycl-split-modules=20" ../

As well as the above, -DEIGEN_HAS_CXX11_MATH=1 should be defined on Arm platforms, it can be given in CMAKE_CXX_FLAGS. It is only possible to compile the SYCL tests using this make command from the root of your build folder.

make -j 8 $(make help | grep -E "sycl" | sed "s/\.\.\.//")

Similarly the command with ninja looks like this.

time ninja $(ninja -t targets | grep -E "sycl" | sed "s/:.*//")

Local Work Group Size

Different devices might have different work group sizes, and the following can be set to accommodate different hardware configurations and architectures.


These are the static value for itemID.get_local(0) and itemID.get_local(1) respectively. They must be a power of 2. It is advised to set the same value for EIGEN_SYCL_LOCAL_THREAD_DIM0 and EIGEN_SYCL_LOCAL_THREAD_DIM1.

If the device has resource constraints you should restrict the maximum number of threads. This is important for certain processors, for example Intel CPUs and GPUs, and Arm devices. This macro is used for TensorScanOp on GPU for SYCL back-end. The following default option is equivalent of 1024 blocks on AMD which has 256 work group size.


Local Memory Usage

It might be necessary to disable local memory on specific hardware that does not have that type of memory. The macros below can be used to enable and disable local memory. If both are defined or neither are defined, the decision for using the local memory will be made at runtime. The default is to use local memory in order to improve performance when there is a dedicated shared memory.


On devices such as Arm GPUs that do not have dedicated shared memory, local memory is used to mimic global memory so it is best to disable the EIGEN_SYCL_LOCAL_MEM macro and enable the EIGEN_SYCL_NO_LOCAL_MEM macro.


Register Usage

If the device you are using has limitations on registers, you can define the size of the registers. If there are no limitations the launcher tries to choose the optimized number of registers.


Barrier Usage

The following macro gives a performance boost on the ARM Mali GPU by disabling some barriers.


Device selection

By default when running the tests, they will run on all available SYCL devices. Compiling with the EIGEN_SYCL_USE_DEFAULT_SELECTOR macro will instead select the preferred device using cl::sycl::default_selector that can be controlled at runtime by the environment variable COMPUTECPP_TARGET. (Note: COMPUTECPP_TARGET is a ComputeCpp specific macro and is not available in other SYCL implementations)


Eigen Vectorization

The following flags can be used for enabling and disabling Eigen vectorization. By default Eigen vectorization is enabled however, if the device does not support vectorization or you are compiling the code for OpenCL CPU devices you should pass both the above flags in compute++ and CXX compilers to disable vectorization as Compute Aorta has its own vectorization system. Remember that either both flags should be set or none. If you set one of them and don't set the other one you may get inconsistent kernel generated for host and device.


Use Program Class

In some cases it can be useful to make Eigen use the cl::sycl::program class, this can be done by enabling the following macro.


Enable Exceptions

By default exceptions are disabled. Enabling them will change how SYCL errors are handled and will use the wait_and_throw variant instead of a wait. It is recommended to enable it when an error occurs.


    Select a Product

    Please select a product

    ComputeCpp enables developers to integrate parallel computing into applications using SYCL and accelerate code on a wide range of OpenCL devices such as GPUs.

    ComputeSuite for R-Car enables developers to accelerate their applications on a wide range of Renesas R-Car based hardware such as the H3 and V3M, using widely supported open standards such as Khronos SYCL and OpenCL.


    part of our network