info

Please note that you are viewing a guide targeting an older version of ComputeCpp™ Community Edition. This guide was designed for version 1.1.0.

TensorFlow ARM Setup

link

Introduction

Codeplay and Arm have collaborated to bring TensorFlow support to Arm Mali™ via the SYCL™ and OpenCL™ open standards for heterogeneous computing. This guide describes how to build and run TensorFlow on an Arm Mali device.

If you would like to follow a more generic guide we also detail how to build TensorFlow with SYCL for AMD hardware, these can potentially be adapted for other platforms we do not list in our documentation.

The supported platform for this release is the HiKey 960 development board, running Debian 9. For other platforms, please adapt the instructions below.

Configuration Management

These instructions relate to the following software versions:

Project Version

TensorFlow: 1ca6b1d
ComputeCpp: 1.0.0
CPU: 32- or 64-bit ARM CPU
GPUs: Arm Mali G71 MP8

Notes

For older or newer versions of TensorFlow, please contact Codeplay for updated build documentation.
GPUs other than those listed above may work, but Codeplay does not support them at this time.

Pre-requisites

A development PC with Ubuntu 16.04.3 64-bit installed.
Hikey 960 development board.
Please contact Arm to obtain an Arm Mali driver with support for OpenCL 1.2 with SPIR-V.
- Note that, as of the date of publication, the publicly released Arm Mali drivers available at https://developer.arm.com/products/software/mali-drivers/user-space do not correctly support SPIR-V.
Install ComputeCpp (Select "Ubuntu 16.04" as the Operating System even if you are using another Linux distribution and "arm64" as the Architecture.)

Build TensorFlow for ARM Mali

The following steps have been verified on a clean installation of Ubuntu 16.04.3 64-bit.

Set up the environment for the ARM architecture that you want to target:

For 32-bit ARM CPUs:

export TARGET\_ARCH=armhf

For 64-bit ARM CPUs:

export TARGET\_ARCH=arm64

Install dependency packages

sudo dpkg --add-architecture $TARGET_ARCH
echo "deb [arch=$TARGET_ARCH] http://ports.ubuntu.com/ xenial main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/arm.list
echo "deb [arch=$TARGET_ARCH] http://ports.ubuntu.com/ xenial-updates main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/arm.list
echo "deb [arch=$TARGET_ARCH] http://ports.ubuntu.com/ xenial-security main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/arm.list
echo "deb [arch=$TARGET_ARCH] http://ports.ubuntu.com/ xenial-backports main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/arm.list
sudo sed -i 's#deb  http://gb.archive.ubuntu.com/ubuntu#deb  [arch=amd64]  http://gb.archive.ubuntu.com/ubuntu#g ' /etc/apt/sources.list
sudo sed -i 's#deb   http://security.ubuntu.com/ubuntu#deb   [arch=amd64]   http://security.ubuntu.com/ubuntu#g   ' /etc/apt/sources.list
sudo apt-get update
sudo apt-get install -y git libpython-all-dev:$TARGET_ARCH opencl-headers openjdk-8-jdk python python-numpy python-pip zlib1g-dev:$TARGET_ARCH

Install toolchains

Register for an account on Codeplay’s developer website: https://developer.codeplay.com/computecppce/latest/download
From that Downloads page, download the following version: Ubuntu 16.04 > 64bit > computecpp-ce-1.0.0-ubuntu.16.04-64bit.tar.gz

For 32-bit ARM CPUs:

Download the following ComputeCpp version: Ubuntu 14.04 > arm32 > computecpp-ce-1.0.0-ubuntu.14.04-arm32.tar.gz

tar -xf ComputeCpp-CE-1.0.0-Ubuntu.16.04-64bit.tar.gz
tar -xf ComputeCpp-CE-1.0.0-Ubuntu.14.04-ARM32.tar.gz
cp ComputeCpp-CE-1.0.0-Ubuntu-16.04-x86\_64/bin/compute++ ComputeCpp-CE-1.0.0-Ubuntu-14.04-ARM\_32/bin
wget https://releases.linaro.org/components/toolchain/binaries/6.3-2017.05/arm-linux-gnueabihf/gcc-linaro-6.3.1-2017.05-x86\_64\_arm-linux-gnueabihf.tar.xz
tar -xf gcc-linaro-6.3.1-2017.05-x86\_64\_arm-linux-gnueabihf.tar.xz
mkdir -p $HOME/gcc-linaro-6.3.1-2017.05-x86\_64\_arm-linux-gnueabihf/arm-linux-gnueabihf/libc/usr/include/arm-linux-gnueabihf
ln -s /usr/include/arm-linux-gnueabihf/python2.7/ $HOME/gcc-linaro-6.3.1-2017.05-x86\_64\_arm-linux-gnueabihf/arm-linux-gnueabihf/libc/usr/include/arm-linux-gnueabihf
export COMPUTECPP\_TOOLKIT\_PATH=$HOME/ComputeCpp-CE-1.0.0-Ubuntu-14.04-ARM\_32
export TF\_SYCL\_CROSS\_TOOLCHAIN=$HOME/gcc-linaro-6.3.1-2017.05-x86\_64\_arm-linux-gnueabihf
export TF\_SYCL\_CROSS\_TOOLCHAIN\_NAME=arm-linux-gnueabihf
export CC\_OPT\_FLAGS="-march=armv7"

For 64-bit ARM CPUs:

Check the version of GCC that is installed on your development board (not your development PC):

gcc -v

If the GCC version is earlier than 5.0, replace "Ubuntu 16.04" with Ubuntu 14.04" in all subsequent steps. * Download the following ComputeCpp version: Ubuntu 16.04 > arm64 > computecpp-ce-1.0.0-ubuntu.16.04-arm64.tar.gz

tar -xf ComputeCpp-CE-1.0.0-Ubuntu.16.04-64bit.tar.gz
tar -xf ComputeCpp-CE-1.0.0-Ubuntu.16.04-ARM64.tar.gz
cp ComputeCpp-CE-1.0.0-Ubuntu-16.04-x86\_64/bin/compute++ ComputeCpp-CE-1.0.0-Ubuntu-16.04-ARM\_64/bin
wget https://releases.linaro.org/components/toolchain/binaries/6.3-2017.05/aarch64-linux-gnu/gcc-linaro-6.3.1-2017.05-x86\_64\_aarch64-linux-gnu.tar.xz
tar -xf gcc-linaro-6.3.1-2017.05-x86\_64\_aarch64-linux-gnu.tar.xz
mkdir -p $HOME/gcc-linaro-6.3.1-2017.05-x86\_64\_aarch64-linux-gnu/aarch64-linux-gnu/libc/usr/include/aarch64-linux-gnu
ln -s /usr/include/aarch64-linux-gnu/python2.7/ $HOME/gcc-linaro-6.3.1-2017.05-x86\_64\_aarch64-linux-gnu/aarch64-linux-gnu/libc/usr/include/aarch64-linux-gnu/
export COMPUTECPP\_TOOLKIT\_PATH=$HOME/ComputeCpp-CE-1.0.0-Ubuntu-16.04-ARM\_64
export TF\_SYCL\_CROSS\_TOOLCHAIN=$HOME/gcc-linaro-6.3.1-2017.05-x86\_64\_aarch64-linux-gnu
export TF\_SYCL\_CROSS\_TOOLCHAIN\_NAME=aarch64-linux-gnu
export CC\_OPT\_FLAGS="-march=armv8-a"

Install Bazel

wget https://github.com/bazelbuild/bazel/releases/download/0.11.1/bazel_0.11.1-linux-x86_64.deb
sudo apt install -y ./bazel_0.11.1-linux-x86_64.deb
bazel version

Check that the bazel version output from the above command is 0.11.1.

Build TensorFlow

git clone http://github.com/codeplaysoftware/tensorflow
cd tensorflow
git checkout 1ca6b1d
export TMPDIR=~/tensorflow_temp
mkdir -p $TMPDIR

export PYTHON_BIN_PATH=/usr/bin/python
export USE_DEFAULT_PYTHON_LIB_PATH=1
export TF_NEED_MKL=0
export TF_NEED_JEMALLOC=1
export TF_NEED_GCP=0
export TF_NEED_HDFS=0
export TF_ENABLE_XLA=0
export TF_NEED_OPENCL_SYCL=1
export TF_NEED_COMPUTECPP=1
export TF_USE_DOUBLE_SYCL=0
export TF_USE_HALF_SYCL=0
export TF_NEED_CUDA=0
export TF_NEED_VERBS=0
export TF_NEED_MPI=0
export TF_NEED_GDR=0
export TF_NEED_S3=0
export TF_NEED_KAFKA=0
export TF_SET_ANDROID_WORKSPACE=0
export TF_SYCL_BITCODE_TARGET=spirv64

Note: You can alternatively set the last parameter to 'spir32', 'spir64, or 'spirv32', depending on whether your OpenCL library supports SPIR 1.2 or SPIR-V and whether the GPU is 32- or 64-bit.

./configure
bazel --output_user_root=$TMPDIR build --config=sycl_arm -c opt --verbose_failures --copt=-DNO_LOCAL_MEM --copt=-DEIGEN_DONT_VECTORIZE_SYCL --copt=-Wno-c++11-narrowing //tensorflow/tools/pip_package:build_pip_package

Note the NO_LOCAL_MEM and EIGEN_DONT_VECTORIZE_SYCL flags are optimizations for HiKey 960. If you are using a different platform, you will most likely want to remove those options. If you are using an Ubuntu 14.04 version of ComputeCpp, add "--copt=-D_GLIBCXX_USE_CXX11_ABI=0" before "//tensorflow/tools/pip_package:build_pip_package" in the above command.

bazel-bin/tensorflow/tools/pip_package/build_pip_package $TMPDIR

Rename the wheel file for the target architecture

For 32-bit ARM CPUs:

mv $TMPDIR/tensorflow-1.6.0rc0-cp27-cp27mu-linux\_x86\_64.whl $TMPDIR/tensorflow-1.6.0rc0-cp27-cp27mu-linux\_arm.whl

For 64-bit ARM CPUs:

mv $TMPDIR/tensorflow-1.6.0rc0-cp27-cp27mu-linux\_x86\_64.whl $TMPDIR/tensorflow-1.6.0rc0-cp27-cp27mu-linux\_aarch64.whl

Setup the Development Board

Install the operating system and Arm Mali driver according to Arm’s instructions.
Copy ComputeCpp-CE-1.0.0-Ubuntu.16.04-ARM64.tar.gz and $TMPDIR/tensorflow-1.6.0rc0-cp27-cp27mu-linux_aarch64.whl to your device e.g. using the 'scp' command.

All of the following commands should be run on the development board. Depending on how your development board’s disk space has been partitioned, you may have to manage the available space carefully - the following requires at least 1.2GB free.

Install dependency packages:

apt-get -y install clinfo git python-pip
pip install numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6 scipy==0.18.1 sklearn

Verify that the OpenCL installation is correct:

clinfo

If any errors are present, check the installation of the OpenCL driver.
It’s important to have this step working correctly, or it is likely that you run into errors later when running TensorFlow.
For example, if the OpenCL driver cannot be found, ensure that LD_LIBRARY_PATH has been set correctly.

Install Tensorflow:

pip install tensorflow-1.6.0rc0-cp27-cp27mu-linux\_aarch64.whl

Set up ComputeCpp:

tar -xf ComputeCpp-CE-1.0.0-Ubuntu.16.04-ARM64.tar.gz
export LD\_LIBRARY\_PATH+=:$HOME/ComputeCpp-CE-1.0.0-Ubuntu-16.04-ARM\_64/lib
$HOME/ComputeCpp-CE-1.0.0-Ubuntu-16.04-ARM\_64/bin/computecpp\_info

The output should show that the Mali-G71 OpenCL driver has been found, and that it doesn’t support SPIR - that is ok.

Run Benchmarks

To verify the installation, you can execute some of the standard TensorFlow benchmarks. The example below shows how to run AlexNet:

git clone http://github.com/tensorflow/benchmarks
cd benchmarks
git checkout f5d85aef2851881001130b28385795bc4c59fa38
python scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --num_batches=10 --local_parameter_device=sycl --device=sycl --batch_size=1 --forward_only=true --model=alexnet --data_format=NHWC

You may see warnings about deprecated functions, but they can be safely ignored.

Rate this Guide

TensorFlow AMD Setup

Eigen Guide

assignmentJump to Section

TensorFlow ARM Setup
Introduction
Configuration Management
Pre-requisites
Build TensorFlow for ARM Mali
Setup the Development Board
Run Benchmarks

Select a Product

oneAPI

Please select a product

oneAPI

oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across accelerator architecture - for faster application performance, more productivity, and greater innovation.

Dark Mode

Dark Mode

Light text on a dark background.

Light Mode

Light Mode

Dark text on a light background.

Also,

part of our network