TensorFlow AMD Setup

This guide will explain how to set up your machine to run the OpenCL™ version of TensorFlow™ using ComputeCpp, a SYCL™ implementation. This guide describes how to build and run TensorFlow 1.9 on any device supporting SPIR or SPIR-V.

These instructions were tested on Ubuntu 16.04 with an AMD R9 Nano Fury GPU. For Arm platforms please read our other guide and for other platforms, please adapt the instructions below.

Environment Setup

These instructions related to the following versions:

  • Tensorflow : ee7c7ba
  • ComputeCpp : 1.0.0
  • CPU : 64-bit CPU
  • GPU : AMD R9 Nano Fury


  • For older or newer versions of TensorFlow, please contact Codeplay for build documentation
  • OpenCL devices other than those listed above may work, but Codeplay does not support them at this time


Ubuntu 16.04.1 (more recent versions of Ubuntu, or other Linux distributions, may work but are not supported)

Driver amdgpu-pro version 17.50-511655

Note: this version is required as more recent versions have not been verified to work, and may break OpenCL support for features such as SPIR

wget --referer http://support.amd.com/ https://www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-17.50-511655.tar.xz
tar xf amdgpu-pro-17.50-511655.tar.xz
./amdgpu-pro-17.50-511655/amdgpu-pro-install --opencl=legacy --headless

These options install the driver for compute only, so please look at the help if you want to use this driver for graphics too.

Verify your OpenCL installation with clinfo

sudo apt-get update
sudo apt-get install clinfo

The output should list at least one platform and one device. The Extensions field of the device properties should include "cl_khr_spir" and/or "cl_khr_il_program".

Build TensorFlow with SYCL

Install dependency packages:

sudo apt-get update
sudo apt-get install -y git gcc build-essential libpython-all-dev opencl-headers openjdk-8-jdk python python-dev python-pip zlib1g-dev
sudo pip install numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6 scipy==0.18.1 sklearn

Specific python package versions are added here for reference. Version 1.14.5 of numpy is required, as newer versions are known to break.

Install toolchains

tar -xf ComputeCpp-CE-1.0.0-Ubuntu.16.04-64bit.tar.gz
sudo mv ComputeCpp-CE-1.0.0-Ubuntu-16.04-x86\_64 /usr/local/computecpp
export COMPUTECPP\_TOOLKIT\_PATH=/usr/local/computecpp
export LD\_LIBRARY\_PATH+=:/usr/local/computecpp/lib

The "computecpp_info" tool should now list your supported devices similar to the below message.

ComputeCpp Info (CE 1.0.0)
Toolchain information:

GLIBCXX: 20150426
This version of libstdc++ is supported.
Device Info:
Discovered 1 devices matching:
  platform    :
  device type :
Device 0:

  Device is supported                     : YES - Tested internally by Codeplay Software Ltd.
  CL_DEVICE_NAME                          : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
  CL_DEVICE_VENDOR                        : Intel(R) Corporation
  CL_DRIVER_VERSION                       :
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_CPU


  • If you see "error while loading shared libraries: libOpenCL.so" then you have not installed the OpenCL drivers needed to run ComputeCpp.

  • If the device is listed as "untested" by the tool, Codeplay does not test or support that specific device but it would still be expected to work correctly.

Install Bazel

wget https://github.com/bazelbuild/bazel/releases/download/0.16.0/bazel_0.16.0-linux-x86_64.deb
sudo apt install -y bazel_0.16.0-linux-x86_64.deb
bazel version

Check that the bazel version output from the above command is 0.16.0. More recent versions may work but are not supported.

Build and Test TensorFlow

git clone http://github.com/codeplaysoftware/tensorflow
cd tensorflow
git checkout ee7c7ba6aea710f09ce09af45885922a55577496
export PYTHON_BIN_PATH=/usr/bin/python
export TF_NEED_MKL=0
export TF_NEED_GCP=0
export TF_NEED_HDFS=0
export TF_ENABLE_XLA=0
export TF_NEED_CUDA=0
export TF_NEED_VERBS=0
export TF_NEED_MPI=0
export TF_NEED_GDR=0
export TF_NEED_AWS=0
export TF_NEED_S3=0
export TF_NEED_KAFKA=0

Set this to 0 if double-precision floating-point operations are not needed, or are not supported by your device (you can verify this by checking for "cl_khr_fp64" in the Extensions field of the device properties output by the clinfo command).


Half-precision floating-point operations are not supported yet.

  • The possible values for this option are 'spir32', 'spir64, 'spirv32', 'spirv64' or 'ptx64' depending on which intermediate language your OpenCL library supports:
  • 'ptx64' is for Nvidia GPUs.
  • On other platforms, check the device properties output by the clinfo command:
    • In the Extensions field, if "cl_khr_spir" is present, use 'spirXX', or if cl_khr_il_program is present, use 'spirvXX'.
    • Substitute XX above for the value of the "Address bits" field. Note that issues can arise if the device’s "Address bits" value doesn’t match that of the host CPU e.g. a 64-bit CPU and 32-bit GPU.
bazel build --verbose_failures -c opt --config=sycl --cxxopt=-no-serial-memop //tensorflow/tools/pip_package:build_pip_package
  • --cxxopt=-no-serial-memop is recommended for better performances however not all devices support it. If all the operations using the GPU throw an exception, please remove this option.

Build and install the wheel

bazel-bin/tensorflow/tools/pip_package/build_pip_package <path/to/output/folder>
sudo pip install <path/to/output/folder>/tensorflow-1.9.0-cp27-cp27mu-linux_x86_64.whl

Running the Benchmarks

To verify the installation, you can execute some of the standard TensorFlow benchmarks. The example below shows how to run AlexNet:

git clone http://github.com/tensorflow/benchmarks
cd benchmarks
git checkout ae48d7118ac17e5069820bf3105a766317d72f3a

More recent versions should work but require some changes in the option to allow for SYCL.

python scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --num_batches=10 --local_parameter_device=sycl --device=sycl --batch_size=1 --forward_only=true --model=alexnet --data_format=NHWC

You may see warnings about deprecated functions, but they can be safely ignored.

Run the Tests

Running the tests is a good way to check which operations are currently supported with a particular device. You can do so with the command:

bazel test --test_lang_filters=cc,py --test_timeout 1500 --verbose_failures -c opt --config=sycl --cxxopt=-no-serial-memop -- //tensorflow/... -//tensorflow/compiler/... -//tensorflow/contrib/distributions/... -//tensorflow/contrib/lite/... -//tensorflow/contrib/session_bundle/... -//tensorflow/contrib/slim/... -//tensorflow/contrib/verbs/... -//tensorflow/core/distributed_runtime/... -//tensorflow/core/kernels/hexagon/... -//tensorflow/go/... -//tensorflow/java/... -//tensorflow/python/debug/... -//tensorflow/stream_executor/...

Make sure to use the same compiler options as during the build step, in particular regarding --cxxopt=-no-serial-memop.

Select a Product

Please select a product

ComputeCpp enables developers to integrate parallel computing into applications using SYCL™ and accelerate code on a wide range of OpenCL™ devices such as GPUs.

ComputeSuite for R-Car enables developers to accelerate their applications on Renesas® R-Car based hardware such as the V3M and V3H, using the widely supported open standards SYCL and OpenCL.

Network Icon


part of our network