TensorFlow Cross Compilation

Decprecated: Please note that you are viewing a guide targeting an older version of ComputeCpp Community Edition. This version of the guide is designed specifically for version 2.1.0.


This guide will explain how to set up your machine to run the SYCLâ„¢ version of TensorFlowâ„¢ using ComputeCpp. This guide describes how to cross-compile TensorFlow 1.9 and run it on any device supporting SPIR or SPIR-V. To compile natively please read our other guide.

Configuration management

These instructions relate to the following configuration:

Host platform Ubuntu 16.04, x86_64 architecture (more recent versions of Ubuntu, or other Linux distributions, may work but are not supported)
Target platform HiKey 960 development board with an Arm Mali G71 MP8 GPU, running Debian 9, aarch64 architecture
Chromebook with a PowerVR Rogue GX6250 GPU, running Ubuntu 16.04, aarch64 architecture
TensorFlow master
ComputeCpp Latest
Python 3.5 (more recent versions of Python may work but are not supported)


  • For older or newer versions of TensorFlow, please contact Codeplay for build documentation.
  • If you are interested in the latest features you may try our experimental branch.
  • OpenCL devices other than those listed above may work, but Codeplay does not support them at this time.
  • It is strongly recommended to make sure you have a working OpenCL installation before building TensorFlow, see here.
  • It is strongly recommended to test your ComputeCpp installation using the Eigen tests before building TensorFlow.

Build TensorFlow with SYCL

Install dependency packages

sudo dpkg --add-architecture arm64
echo "deb [arch=arm64] http://ports.ubuntu.com/ xenial main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/arm.list
echo "deb [arch=arm64] http://ports.ubuntu.com/ xenial-updates main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/arm.list
echo "deb [arch=arm64] http://ports.ubuntu.com/ xenial-security main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/arm.list
echo "deb [arch=arm64] http://ports.ubuntu.com/ xenial-backports main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/arm.list
sudo sed -i 's#deb  http://gb.archive.ubuntu.com/ubuntu#deb  [arch=amd64]  http://gb.archive.ubuntu.com/ubuntu#g ' /etc/apt/sources.list
sudo sed -i 's#deb   http://security.ubuntu.com/ubuntu#deb   [arch=amd64]   http://security.ubuntu.com/ubuntu#g   ' /etc/apt/sources.list
sudo apt update
sudo apt install -y git cmake libpython3-all-dev:arm64 opencl-headers openjdk-8-jdk python3 python3-pip zlib1g-dev:arm64 ocl-icd-opencl-dev:arm64
pip install -U --user numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6

Specific python package versions are added here for reference. More recent versions of numpy is known to break some tests in this version of TensorFlow.

The rest of the guide will assume Python 3.5 is used by default. This can be done for example with:

alias python=python3.5
alias pip=pip3.5

Install toolchains

  • Register for an account on Codeplay's developer website: https://developer.codeplay.com/computecppce/latest/download
  • From that page, download the following version: Latest > linux-gnu > x86_64 computecpp-ce-*-x86_64-linux-gnu.tar.gz
  • From the same page, download the following version: Latest > linux-gnu > aarch64 computecpp-ce-*-aarch64-linux-gnu.tar.gz

Set up an environment variable with the ComputeCpp version so that you can copy and paste the commands below.

For example:

export CCPP_VERSION=2.0.0
tar -xf computecpp-ce-${CCPP_VERSION}-x86_64-linux-gnu.tar.gz
tar -xf ComputeCpp-CE-${CCPP_VERSION}-aarch64-linux-gnu.tar.gz
cp ComputeCpp-CE-${CCPP_VERSION}-Ubuntu-16.04-x86_64/bin/* ComputeCpp-CE-${CCPP_VERSION}-Ubuntu-16.04-ARM_64/bin/

wget https://releases.linaro.org/components/toolchain/binaries/6.3-2017.05/aarch64-linux-gnu/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu.tar.xz
tar -xf gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu.tar.xz
mkdir -p $HOME/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc/usr/include/aarch64-linux-gnu
ln -s /usr/include/aarch64-linux-gnu/python3.5m/ $HOME/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc/usr/include/aarch64-linux-gnu/
ln -s /usr/lib/aarch64-linux-gnu/libOpenCL.so $HOME/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc/usr/lib/libOpenCL.so

export TF_SYCL_CROSS_TOOLCHAIN=$HOME/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu
export TF_SYCL_CROSS_TOOLCHAIN_NAME=aarch64-linux-gnu

Install Bazel

wget https://github.com/bazelbuild/bazel/releases/download/0.16.0/bazel_0.16.0-linux-x86_64.deb
sudo apt install -y ./bazel_0.16.0-linux-x86_64.deb
bazel version

Check that the bazel version output from the above command is 0.16.0.

Configure TensorFlow

Configure TensorFlow as described in the native guide.

CC_OPT_FLAGS is a set of flags to provide when compiling with --config=opt. It will vary depending on your architecture. The flag -march=native cannot be used when cross-compiling, you can set for instance:

export CC_OPT_FLAGS="-march=armv8-a"

If you are unsure which flags to use, do not use --config=opt.

Build TensorFlow


bazel build --verbose_failures --jobs=6 --config=sycl --config=opt //tensorflow/tools/pip_package:build_pip_package

bazel-bin/tensorflow/tools/pip_package/build_pip_package $TF_WHL_DIR

mv $TF_WHL_DIR/tensorflow-1.9.0-cp35-cp35m-linux_x86_64.whl $TF_WHL_DIR/tensorflow-1.9.0-cp35-cp35m-linux_aarch64.whl

The .whl file may have a different name depending on your architecture or the Python version provided by PYTHON_BIN_PATH. Here Python 3.5 is assumed to be the default version used.


  • Make sure to re-run ./configure if you change any of the environment variables described in this guide.
  • It is recommended to keep the exact same environment variables when building again with bazel to avoid re-building from scratch.
  • It is recommended to provide --jobs=X to bazel with X strictly smaller than your number of threads to reduce RAM usage.

Bundle and install the wheel

Choose an existing folder to output the TensorFlow wheel, for example:

bazel-bin/tensorflow/tools/pip_package/build_pip_package $TF_WHL_DIR
mv $TF_WHL_DIR/tensorflow-1.9.0-cp35-cp35m-linux_x86_64.whl $TF_WHL_DIR/tensorflow-1.9.0-cp35-cp35m-linux_aarch64.whl

Set up the development board

Copy ComputeCpp-CE-${CCPP_VERSION}-Ubuntu.16.04-ARM64.tar.gz and $TF_WHL_DIR/tensorflow-1.9.0-cp35-cp35m-linux_aarch64.whl to your device e.g. using the scp command.

All of the following commands should be run on the development board. Depending on how your development board's disk space has been partitioned, you may have to manage the available space carefully - the following requires at least 1.2GB free.

Install an OpenCL driver

See our other guide to install an OpenCL driver supporting SPIR or SPIR-V.

Install dependency packages

apt -y install git python3 python3-pip
# The following apt packages are required to build scipy from source but can be removed later
apt -y install gcc gfortran python3-dev libopenblas-dev liblapack-dev cython

Make Python 3.5 the default for example with:

alias python=python3.5
alias pip=pip3.5

Many tests and benchmarks require more pip packages than the minimal set of packages listed in the pre-requisites. The versions listed below are known to work with this build of TensorFlow:

pip install -U --user numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6 portpicker==1.2.0
# Cython is required to build the next packages from source but can be removed later
pip install -U --user cython==0.29.1
pip install -U --user scipy==1.1.0
pip install -U --user scikit-learn==0.20.2
pip install -U --user --no-deps sklearn

Install Tensorflow

pip install --user tensorflow-1.9.0-cp35-cp35m-linux_aarch64.whl

Set up ComputeCpp

tar -xf ComputeCpp-CE-${CCPP_VERSION}-Ubuntu.16.04-ARM64.tar.gz
export LD_LIBRARY_PATH+=:$HOME/ComputeCpp-CE-${CCPP_VERSION}-Ubuntu-16.04-ARM_64/lib

The output should show that at least one OpenCL driver has been found, it may not support SPIR which is fine.

Run benchmarks

To verify the installation, you can execute some of the standard TensorFlow benchmarks. You can find an example in our other guide.

Select a Product

Please select a product

ComputeCpp enables developers to integrate parallel computing into applications using SYCL™ and accelerate code on a wide range of OpenCL™ devices such as GPUs.

ComputeSuite for R-Car enables developers to accelerate their applications on Renesas® R-Car based hardware such as the V3M and V3H, using the widely supported open standards SYCL and OpenCL.

Network Icon


part of our network