Version Latest

TensorFlow Generic Setup

This guide will explain how to set up your machine to run the OpenCL™ version of TensorFlow™ using ComputeCpp, a SYCL™ implementation. This guide describes how to build and run TensorFlow 1.9 on any device supporting SPIR or SPIR-V.

These instructions were tested on Ubuntu 16.04 with an AMD R9 Nano Fury GPU. For Arm platforms please read our other guide and for other platforms, please adapt the instructions below.

Configuration management

These instructions relate to the following versions:

  • Tensorflow : master
  • ComputeCpp : 1.1.1
  • CPU : 64-bit CPU
  • GPU : AMD R9 Nano Fury or Intel Gen9 HD Graphics


  • For older or newer versions of TensorFlow, please contact Codeplay for build documentation.
  • If you are interested in the latest features you may try our experimental branch.
  • OpenCL devices other than those listed above may work, but Codeplay does not support them at this time.


  • Ubuntu 16.04.1 (more recent versions of Ubuntu, or other Linux distributions, may work but are not supported) Use either or all of the following OpenCL drivers depending on your hardware.

Driver amdgpu-pro version 17.50-511655

Note: this version is required as more recent versions have not been verified to work, and may break OpenCL support for features such as SPIR.

wget --referer
tar xf amdgpu-pro-17.50-511655.tar.xz
./amdgpu-pro-17.50-511655/amdgpu-pro-install --opencl=legacy --headless

These options install the driver for compute only, so please look at the help if you want to use this driver for graphics too.

Driver Intel NEO version 18.38.11535

This is the latest version tested. Newer versions of the driver may work but could have performance or correctness issues.

sudo dpkg -i intel-opencl_18.38.11535_amd64.deb

Verify your OpenCL installation with clinfo

sudo apt-get update
sudo apt-get install clinfo

The output should list at least one platform and one device. The "Extensions" field of the device properties should include cl_khr_spir and/or cl_khr_il_program.

Build TensorFlow with SYCL

Install dependency packages

sudo apt-get update
sudo apt-get install git cmake gcc build-essential libpython-all-dev opencl-headers openjdk-8-jdk python python-dev python-pip zlib1g-dev
pip install --user numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6

Specific python package versions are added here for reference. Version 1.14.5 of numpy is required, as newer versions are known to break.

Install toolchains

tar -xf ComputeCpp-CE-1.1.1-Ubuntu.16.04-64bit.tar.gz
sudo mv ComputeCpp-CE-1.1.1-Ubuntu-16.04-x86_64 /usr/local/computecpp
export COMPUTECPP_TOOLKIT_PATH=/usr/local/computecpp
export LD_LIBRARY_PATH+=:/usr/local/computecpp/lib

The computecpp_info tool should now list your supported devices similar to the below message.

ComputeCpp Info (CE 1.1.1)
Toolchain information:

GLIBCXX: 20150426
This version of libstdc++ is supported.
Device Info:
Discovered 1 devices matching:
  platform    : <any>
  device type : <any>
Device 0:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Fiji
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2482.3
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU



  • If you see "error while loading shared libraries:" then you have not installed the OpenCL drivers needed to run ComputeCpp.
  • If the device is listed as "untested" by the tool, Codeplay does not test or support that specific device but it would still be expected to work correctly.

Install Bazel

sudo apt install -y bazel_0.16.0-linux-x86_64.deb
bazel version

Check that the bazel version output from the above command is 0.16.0. More recent versions may work but are not supported.

Build TensorFlow

git clone
cd tensorflow

The configure step is controlled by environment variables. You can leave them unset to instead get prompted questions. Note that the variables are read once when ./configure is executed and the result is written in .tf_configure.bazelrc. Make sure to re-run ./configure if you wish to change these variables.

export PYTHON_BIN_PATH=/usr/bin/python
export TF_NEED_MKL=0
export TF_NEED_GCP=0
export TF_NEED_HDFS=0
export TF_ENABLE_XLA=0
export TF_NEED_CUDA=0
export TF_NEED_VERBS=0
export TF_NEED_MPI=0
export TF_NEED_GDR=0
export TF_NEED_AWS=0
export TF_NEED_S3=0
export TF_NEED_KAFKA=0

The following environment variables may differ depending on your device:


Set this to 0 if double-precision floating-point operations are not needed, or are not supported by your device. You can verify this by checking for cl_khr_fp64 in the "Extensions" field of the device properties output by the clinfo command.


Half-precision floating-point operations are not supported yet.


This option is not prompted during the configure step. If it is unset, some operations will generate two sets of kernels - using local memory and not using local memory - and the right one will be selected at runtime. It is advised to set this value to "1" if your device supports it. Set it to "0" if the "Local memory type" field of clinfo is not "Local" or if "Local memory size" is smaller than 4KiB.


This option is not prompted during the configure step. It is recommended to set it to "0" for better performance. Not all devices support this option. If all the operations using the GPU throw an exception, please set this to "1".

  • The possible values for this option are spir32, spir64, spirv32, spirv64 or ptx64 depending on which intermediate language your OpenCL library supports:
  • ptx64 is for Nvidia GPUs.
  • On other platforms, check the device properties output by the clinfo command:
    • In the "Extensions" field, if cl_khr_spir is present, use spirXX, or if cl_khr_il_program is present, use spirvXX.
    • Substitute "XX" above for the value of the "Address bits" field. Note that issues can arise if the device's "Address bits" value does not match that of the host CPU e.g. a 64-bit CPU and 32-bit GPU.

Build the pip package with:

bazel build --verbose_failures -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package

Bundle and install the wheel

bazel-bin/tensorflow/tools/pip_package/build_pip_package <path/to/output/folder>
pip install --user <path/to/output/folder>/tensorflow-1.9.0-cp27-cp27mu-linux_x86_64.whl

Many tests and benchmarks require more pip packages than the minimal set of packages listed in the pre-requistes. The versions listed below are known to work with this build of TensorFlow:

pip install -U --user numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6 portpicker==1.2.0
pip install -U --user scipy==1.1.1
pip install -U --user scikit-learn==0.20.2
pip install -U --user --no-deps sklearn

Run the benchmarks

To verify the installation, you can execute some of the standard TensorFlow benchmarks. The example below shows how to run AlexNet:

git clone
cd benchmarks
git checkout f5d85aef2851881001130b28385795bc4c59fa38
python scripts/tf_cnn_benchmarks/ --num_batches=10 --local_parameter_device=sycl --device=sycl --batch_size=1 --forward_only=true --model=alexnet --data_format=NHWC

Setting a higher batch_size will increase the GPU usage and give better inference/s but this is not always possible in real world applications. You may see warnings about deprecated functions, but they can be safely ignored.

Run the tests

Running the tests is a good way to check which operations are currently supported with a particular device. You can do so with the command:

bazel test --test_lang_filters=cc,py --test_timeout 1500 --verbose_failures -c opt --config=sycl -- //tensorflow/... -//tensorflow/compiler/... -//tensorflow/contrib/distributions/... -//tensorflow/contrib/lite/... -//tensorflow/contrib/session_bundle/... -//tensorflow/contrib/slim/... -//tensorflow/contrib/verbs/... -//tensorflow/core/distributed_runtime/... -//tensorflow/core/kernels/hexagon/... -//tensorflow/go/... -//tensorflow/java/... -//tensorflow/python/debug/... -//tensorflow/stream_executor/...

Make sure to use the same options as during the build step. The command above will run about 1500 tests. The sets starting with "-" are removed as they are either not related to SYCL or not supported.


    Select a Product

    Please select a product

    ComputeCpp enables developers to integrate parallel computing into applications using SYCL and accelerate code on a wide range of OpenCL devices such as GPUs.

    ComputeSuite for R-Car enables developers to accelerate their applications on a wide range of Renesas R-Car based hardware such as the H3 and V3M, using widely supported open standards such as Khronos SYCL and OpenCL.


    part of our network