Getting Started with Eigen

This guide was created for versions: v0.1.0 - Latest

This guide will demonstrate how to get started using Eigen with ComputeCpp. Within the Eigen project there are a number of operations that provide "tensor" functions used by the TensorFlow project. These operations could be used by different applications that require tensor operations that are particularly useful for building applications and frameworks that make use of neural networks. Within the Eigen project there is a folder under the "unsupported" folder, where unsupported means that the code is preparing to be accepted in the main Eigen codebase. This is where the tensor operations have been implemented and many of these have been implemented to run on parallel hardware using ComputeCpp, and implementation of SYCL.

Pre-requisites


ComputeCpp (see the Getting Started Guide for details)
The Eigen project source (We are currently upstreaming our SYCL changes to the Eigen codebase, clone the following into /usr/local Using the command "hg clone https://bitbucket.org/mehdi_goli/opencl/src/Eigen-SYCL-OpenCL/")

Creating a Simple Application

We'll use a simple example of an Eigen application that uses the SYCL implementation of the "tensor" operation to show how to get set up with the SYCL implementation of Eigen.

First, open a terminal

>cd /home
>mkdir /eigentensor
>cd /eigentensor

Now, copy the sample code below into a new file, and save it as /home/eigentensor/eigen.cpp

#include <iostream>
#define EIGEN_USE_SYCL
#include <unsupported/Eigen/CXX11/Tensor>

using Eigen::array;
using Eigen::SyclDevice;
using Eigen::Tensor;
using Eigen::TensorMap;

int main()
{
  using DataType = float;
  using IndexType = int64_t;
  constexpr auto DataLayout = Eigen::RowMajor;

  auto devices = Eigen::get_sycl_supported_devices();
  const auto device_selector = *devices.begin();
  Eigen::QueueInterface queueInterface(device_selector);
  auto sycl_device = Eigen::SyclDevice(&queueInterface);
  
  // create the tensors to be used in the operation
  IndexType sizeDim1 = 3;
  IndexType sizeDim2 = 3;
  IndexType sizeDim3 = 3;
  array<IndexType, 3> tensorRange = {{sizeDim1, sizeDim2, sizeDim3}};

  // initialize the tensors with the data we want manipulate to
  Tensor<DataType, 3,DataLayout, IndexType> in1(tensorRange);
  Tensor<DataType, 3,DataLayout, IndexType> in2(tensorRange);
  Tensor<DataType, 3,DataLayout, IndexType> out(tensorRange);

  // set up some random data in the tensors to be multiplied
  in1 = in1.random();
  in2 = in2.random();

  // allocate memory for the tensors
  DataType * gpu_in1_data  = static_cast<DataType*>(sycl_device.allocate(in1.size()*sizeof(DataType)));
  DataType * gpu_in2_data  = static_cast<DataType*>(sycl_device.allocate(in2.size()*sizeof(DataType)));
  DataType * gpu_out_data =  static_cast<DataType*>(sycl_device.allocate(out.size()*sizeof(DataType)));

  // 
  TensorMap<Tensor<DataType, 3, DataLayout, IndexType>> gpu_in1(gpu_in1_data, tensorRange);
  TensorMap<Tensor<DataType, 3, DataLayout, IndexType>> gpu_in2(gpu_in2_data, tensorRange);
  TensorMap<Tensor<DataType, 3, DataLayout, IndexType>> gpu_out(gpu_out_data, tensorRange);

  // copy the memory to the device and do the c=a*b calculation
  sycl_device.memcpyHostToDevice(gpu_in1_data, in1.data(),(in1.size())*sizeof(DataType));
  sycl_device.memcpyHostToDevice(gpu_in2_data, in2.data(),(in2.size())*sizeof(DataType));
  gpu_out.device(sycl_device) = gpu_in1 * gpu_in2;
  sycl_device.memcpyDeviceToHost(out.data(), gpu_out_data,(out.size())*sizeof(DataType));
  sycl_device.synchronize();

  // print out the results
   for (IndexType i = 0; i < sizeDim1; ++i) {
    for (IndexType j = 0; j < sizeDim2; ++j) {
      for (IndexType k = 0; k < sizeDim3; ++k) {
        std::cout << "device_out" << "(" << i << ", " << j << ", " << k << ") : " << out(i,j,k) 
                  << " vs host_out" << "(" << i << ", " << j << ", " << k << ") : " << in1(i,j,k) * in2(i,j,k) << "\n";
      }
    }
  }
  printf("c=a*b Done\n");
}

In order to compile this code we need to use ComputeCpp and the Eigen headers.

In this tutorial we will use a simple make file.

Create the file "makefile" in /home/eigentensor

Copy the following into the makefile, but note that make is particular about tabs versus spaces.

# Example Makefile to build a SYCL application using ComputeCpp.

# Your ComputeCpp installation root.
COMPUTECPP_PREFIX ?= /usr/local

COMPUTECPP ?= $(COMPUTECPP_PREFIX)/bin/compute++
COMPUTECPP_INFO ?= $(COMPUTECPP_PREFIX)/bin/computecpp_info
COMPUTECPP_INCLUDES ?= $(COMPUTECPP_PREFIX)/include
COMPUTECPP_LIBS ?= $(COMPUTECPP_PREFIX)/lib

# In addition your normal flags, compilation requires C++11 standard,
# the SYCL headers, and the ComputeCpp library.
CXXFLAGS += --std=c++11 -I$(COMPUTECPP_INCLUDES) -I$(EIGEN_INCLUDES)
LDFLAGS += -L$(COMPUTECPP_LIBS) -lComputeCpp -Wl,--rpath $(COMPUTECPP_LIBS)
COMPUTECPP_FLAGS += \
    $(CXXFLAGS) $(shell $(COMPUTECPP_INFO) --dump-device-compiler-flags)

# The example application to build.
target := eigen
all: $(target)

# Single source multiple pass compilation.
%: %.cpp
    $(COMPUTECPP) $(COMPUTECPP_FLAGS) -c $< -o $@.sycl
    $(CXX) $(CXXFLAGS) -include $@.sycl $< -o $@ $(LDFLAGS)


.PHONY: clean help
clean:
    rm -fv $(target) $(target).sycl $(target).o

help:
    @echo "Builds an example SYCL application."
@echo "Usage: make COMPUTECPP_PREFIX=[path-to-computecpp] {all,clean,help}"

Now type the following commands in the terminal (assuming your ComputeCpp and Eigen installations live in /usr/local). If you see the error "makefile:24: *** missing separator. Stop." you should replace any indentations with tabs rather than spaces.

>cd /home/eigentensor
>make COMPUTECPP_PREFIX=/usr/local/computecpp EIGEN_INCLUDES=/usr/local/eigen

Once this completes you have now compiled your first Eigen application using SYCL.

To run the application now call this command in the terminal


>./eigen

It will print out the result of the tensor operation.