info

Please note that you are viewing a guide targeting an older version of ComputeCpp™ Community Edition. This guide was designed for version 1.1.5.

TensorFlow ARM Setup

link

TensorFlow™ for ARM Setup Guide

link

Introduction

Codeplay and Arm have collaborated to bring TensorFlow support to Arm Mali™ via the SYCL™ and OpenCL™ open standards for heterogeneous computing. This guide describes how to build and run TensorFlow on an Arm Mali device.

If you would like to follow a more generic guide we also detail how to build TensorFlow with SYCL, these can potentially be adapted for other platforms we do not list in our documentation.

The supported platform for this release is the HiKey 960 development board, running Debian 9. For other platforms, please adapt the instructions below.

link

Configuration management

These instructions relate to the following versions:

Tensorflow : master
ComputeCpp: Latest
CPU: 64-bit ARM CPU
GPUs: Arm Mali G71 MP8

Notes:

For older or newer versions of TensorFlow, please contact Codeplay for updated build documentation.
If you are interested in the latest features you may try our experimental branch.
GPUs other than those listed above may work, but Codeplay does not support them at this time.

link

Pre-requisites

A development PC with Ubuntu 16.04.3 64-bit installed.
Hikey 960 development board.
Please contact Arm to obtain an Arm Mali driver with support for OpenCL 1.2 with SPIR-V.
- Note that, as of the date of publication, the publicly released Arm Mali drivers available at https://developer.arm.com/products/software/mali-drivers/user-space do not correctly support SPIR-V.
Install ComputeCpp (Select "Ubuntu 16.04" as the Operating System even if you are using another Linux distribution and "arm64" as the Architecture.)

link

Build TensorFlow for ARM Mali

The following steps have been verified on a clean installation of Ubuntu 16.04.3 64-bit.

link

Install dependency packages

sudo dpkg --add-architecture arm64
echo "deb [arch=arm64] http://ports.ubuntu.com/ xenial main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/arm.list
echo "deb [arch=arm64] http://ports.ubuntu.com/ xenial-updates main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/arm.list
echo "deb [arch=arm64] http://ports.ubuntu.com/ xenial-security main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/arm.list
echo "deb [arch=arm64] http://ports.ubuntu.com/ xenial-backports main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/arm.list
sudo sed -i 's#deb  http://gb.archive.ubuntu.com/ubuntu#deb  [arch=amd64]  http://gb.archive.ubuntu.com/ubuntu#g ' /etc/apt/sources.list
sudo sed -i 's#deb   http://security.ubuntu.com/ubuntu#deb   [arch=amd64]   http://security.ubuntu.com/ubuntu#g   ' /etc/apt/sources.list
sudo apt-get update
sudo apt-get install -y git cmake libpython-all-dev:arm64 opencl-headers openjdk-8-jdk python python-pip zlib1g-dev:arm64 ocl-icd-opencl-dev:arm64
pip install --user numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6

link

Install toolchains

Register for an account on Codeplay's developer website: https://developer.codeplay.com/computecppce/latest/download
From that page, download the following version: Ubuntu 16.04 > 64bit > Latest computecpp-ce-ubuntu.16.04-64bit.tar.gz
From the same page, download the following version: Ubuntu 16.04 > arm64 > Latest computecpp-ce-ubuntu.16.04-arm64.tar.gz

Set up an environment variable with the ComputeCpp version so that you can copy and paste the commands below.

For example:

export CCPP_VERSION=1.1.4

tar -xf ComputeCpp-CE-${CCPP_VERSION}-Ubuntu.16.04-64bit.tar.gz
tar -xf ComputeCpp-CE-${CCPP_VERSION}-Ubuntu.16.04-ARM64.tar.gz
cp ComputeCpp-CE-${CCPP_VERSION}-Ubuntu-16.04-x86_64/bin/* ComputeCpp-CE-${CCPP_VERSION}-Ubuntu-16.04-ARM_64/bin/
wget https://releases.linaro.org/components/toolchain/binaries/6.3-2017.05/aarch64-linux-gnu/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu.tar.xz
tar -xf gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu.tar.xz
mkdir -p $HOME/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc/usr/include/aarch64-linux-gnu
ln -s /usr/include/aarch64-linux-gnu/python2.7/ $HOME/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc/usr/include/aarch64-linux-gnu/
ln -s /usr/lib/aarch64-linux-gnu/libOpenCL.so $HOME/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc/usr/lib/libOpenCL.so
export COMPUTECPP_TOOLKIT_PATH=$HOME/ComputeCpp-CE-${CCPP_VERSION}-Ubuntu-16.04-ARM_64
export TF_SYCL_CROSS_TOOLCHAIN=$HOME/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu
export TF_SYCL_CROSS_TOOLCHAIN_NAME=aarch64-linux-gnu
export CC_OPT_FLAGS="-march=armv8-a"

link

Install Bazel

wget https://github.com/bazelbuild/bazel/releases/download/0.16.0/bazel_0.16.0-linux-x86_64.deb
sudo apt install -y ./bazel_0.16.0-linux-x86_64.deb
bazel version

Check that the bazel version output from the above command is 0.16.0.

link

Build TensorFlow

git clone http://github.com/codeplaysoftware/tensorflow
cd tensorflow
export TMPDIR=~/tensorflow_temp
mkdir -p $TMPDIR

export PYTHON_BIN_PATH=/usr/bin/python
export USE_DEFAULT_PYTHON_LIB_PATH=1
export TF_NEED_JEMALLOC=1
export TF_NEED_MKL=0
export TF_NEED_GCP=0
export TF_NEED_HDFS=0
export TF_ENABLE_XLA=0
export TF_NEED_CUDA=0
export TF_NEED_VERBS=0
export TF_NEED_MPI=0
export TF_NEED_GDR=0
export TF_NEED_AWS=0
export TF_NEED_S3=0
export TF_NEED_KAFKA=0
export TF_DOWNLOAD_CLANG=0
export TF_SET_ANDROID_WORKSPACE=0
export TF_NEED_OPENCL_SYCL=1
export TF_NEED_COMPUTECPP=1
export TF_USE_DOUBLE_SYCL=0
export TF_USE_HALF_SYCL=0
export TF_SYCL_BITCODE_TARGET=spirv64
export TF_SYCL_USE_LOCAL_MEM=0
export TF_SYCL_USE_SERIAL_MEMOP=1

Notes:

TMPDIR has to be out-of-tree.
The possible values for TF_SYCL_BITCODE_TARGET are spir32, spir64, spirv32 or spirv64 depending on which intermediate language your OpenCL library supports. Check the device properties output by the clinfo command:
In the "Extensions" field, if cl_khr_spir is present, use spirXX, or if cl_khr_il_program is present, use spirvXX.
Substitute "XX" above for the value of the "Address bits" field. Note that issues can arise if the device's "Address bits" value does not match that of the host CPU e.g. a 64-bit CPU and 32-bit GPU.
TF_SYCL_USE_LOCAL_MEM is set to "0" to avoid generating kernels that use local memory thus reducing the online compiliation time. In general, you can set it to "1" if the "Local memory type" field of clinfo is "Local" and "Local memory size" is equal or greater than 4KiB for better performance. If it is unset, both kernels will be generated and the best one will be picked at runtime.
TF_SYCL_USE_SERIAL_MEMOP must be set to "1" for this device. Setting it to "0" generates slightly more efficient kernels but is not supported with this device.

./configure
bazel --output_user_root=$TMPDIR build --config=sycl_arm -c opt --verbose_failures --copt=-DEIGEN_DONT_VECTORIZE_SYCL --copt=-Wno-c++11-narrowing //tensorflow/tools/pip_package:build_pip_package

Note the EIGEN_DONT_VECTORIZE_SYCL flag is an optimization for HiKey 960. If you are using a different platform, you will most likely want to remove this option. If you are using an Ubuntu 14.04 version of ComputeCpp, add --copt=-D_GLIBCXX_USE_CXX11_ABI=0 before //tensorflow/tools/pip_package:build_pip_package in the above command.

bazel-bin/tensorflow/tools/pip_package/build_pip_package $TMPDIR

link

Rename the wheel file for the target architecture:

mv $TMPDIR/tensorflow-1.9.0rc0-cp27-cp27mu-linux_x86_64.whl $TMPDIR/tensorflow-1.9.0rc0-cp27-cp27mu-linux_aarch64.whl

link

Set up the development board

Install the operating system and Arm Mali driver according to Arm's instructions.
Copy ComputeCpp-CE-${CCPP_VERSION}-Ubuntu.16.04-ARM64.tar.gz and $TMPDIR/tensorflow-1.9.0rc0-cp27-cp27mu-linux_aarch64.whl to your device e.g. using the scp command.

All of the following commands should be run on the development board. Depending on how your development board's disk space has been partitioned, you may have to manage the available space carefully - the following requires at least 1.2GB free.

link

Install dependency packages

apt-get -y install clinfo git python-pip
# The following apt packages are required to build scipy from source but can be removed later
apt-get -y install gcc gfortran python-dev libopenblas-dev liblapack-dev cython

Many tests and benchmarks require more pip packages than the minimal set of packages listed in the pre-requistes. The versions listed below are known to work with this build of TensorFlow:

pip install -U --user numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6 portpicker==1.2.0
# Cython is required to build the next packages from source but can be removed later
pip install -U --user cython==0.29.1
pip install -U --user scipy==1.1.0
pip install -U --user scikit-learn==0.20.2
pip install -U --user --no-deps sklearn

link

Verify that the OpenCL installation is correct:

clinfo

If any errors are present, check the installation of the OpenCL driver. It is important to have this step working correctly, or it is likely that you run into errors later when running TensorFlow. For example, if the OpenCL driver cannot be found, ensure that LD_LIBRARY_PATH has been set correctly.

link

Install Tensorflow

pip install --user tensorflow-1.9.0rc0-cp27-cp27mu-linux_aarch64.whl

link

Set up ComputeCpp

tar -xf ComputeCpp-CE-${CCPP_VERSION}-Ubuntu.16.04-ARM64.tar.gz
export LD_LIBRARY_PATH+=:$HOME/ComputeCpp-CE-${CCPP_VERSION}-Ubuntu-16.04-ARM_64/lib
$HOME/ComputeCpp-CE-${CCPP_VERSION}-Ubuntu-16.04-ARM_64/bin/computecpp_info

The output should show that the Mali-G71 OpenCL driver has been found, and that it does not support SPIR - that is ok.

link

Run benchmarks

To verify the installation, you can execute some of the standard TensorFlow benchmarks. The example below shows how to run AlexNet:

git clone http://github.com/tensorflow/benchmarks
cd benchmarks
git checkout f5d85aef2851881001130b28385795bc4c59fa38
python scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --num_batches=10 --local_parameter_device=sycl --device=sycl --batch_size=1 --forward_only=true --model=alexnet --data_format=NHWC

Setting a higher batch_size will increase the GPU usage and give better inference/s but this is not always possible in real world applications. You may see warnings about deprecated functions, but they can be safely ignored.

Rate this Guide

TensorFlow Generic Setup

Eigen Guide

assignmentJump to Section

TensorFlow ARM Setup
TensorFlow™ for ARM Setup Guide
Introduction
Configuration management
Pre-requisites
Build TensorFlow for ARM Mali
Set up the development board
Run benchmarks

ComputeCpp™ Menu

Main Menu

Products

menu_bookGuides

TensorFlow ARM Setup

TensorFlow™ for ARM Setup Guide

Introduction

Configuration management

Pre-requisites

Build TensorFlow for ARM Mali

Install dependency packages

Install toolchains

Install Bazel

Build TensorFlow

Rename the wheel file for the target architecture:

Set up the development board

Install dependency packages

Verify that the OpenCL installation is correct:

Install Tensorflow

Set up ComputeCpp

Run benchmarks

TensorFlow Generic Setup

Eigen Guide

assignmentJump to Section

Select a Product

oneAPI

Dark Mode

Light Mode

Also,

part of our network

Codeplay.com

SYCL.tech

Codeplay Developer

Codeplay Open Source

menu_bookGuides

﻿TensorFlow™ for ARM Setup Guide

Introduction

Configuration management

Pre-requisites

Build TensorFlow for ARM Mali

Install dependency packages

Install toolchains

Install Bazel

Build TensorFlow

Rename the wheel file for the target architecture:

Set up the development board

Install dependency packages

Verify that the OpenCL installation is correct:

Install Tensorflow

Set up ComputeCpp

Run benchmarks

TensorFlow Generic Setup

Eigen Guide

assignmentJump to Section

Select a Product

oneAPI

Dark Mode

Light Mode

Codeplay.com

SYCL.tech

Codeplay Developer

Codeplay Open Source

TensorFlow™ for ARM Setup Guide