TensorFlow Generic Setup Guide
This guide will explain how to set up your machine to run the OpenCL™ version of TensorFlow™ using ComputeCpp, a SYCL™ implementation. This guide describes how to build and run TensorFlow 1.9 on any device supporting SPIR or SPIR-V.
These instructions were tested on Ubuntu 16.04 with an AMD R9 Nano Fury GPU. For Arm platforms please read our other guide and for other platforms, please adapt the instructions below.
Configuration management
These instructions relate to the following versions:
- Tensorflow : master
- ComputeCpp : 1.1.2
- CPU : 64-bit CPU
- GPU : AMD R9 Nano Fury or Intel Gen9 HD Graphics
Notes:
- For older or newer versions of TensorFlow, please contact Codeplay for build documentation.
- If you are interested in the latest features you may try our experimental branch.
- OpenCL devices other than those listed above may work, but Codeplay does not support them at this time.
Pre-Requisites
- Ubuntu 16.04.1 (more recent versions of Ubuntu, or other Linux distributions, may work but are not supported) Use either or all of the following OpenCL drivers depending on your hardware.
Driver amdgpu-pro version 17.50-511655
Note: this version is required as more recent versions have not been verified to work, and may break OpenCL support for features such as SPIR.
wget --referer http://support.amd.com/ https://www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-17.50-511655.tar.xz
tar xf amdgpu-pro-17.50-511655.tar.xz
./amdgpu-pro-17.50-511655/amdgpu-pro-install --opencl=legacy --headless
These options install the driver for compute only, so please look at the help if you want to use this driver for graphics too.
Driver Intel NEO version 18.38.11535
This is the latest version tested. Newer versions of the driver may work but could have performance or correctness issues.
wget https://github.com/intel/compute-runtime/releases/download/18.38.11535/intel-opencl_18.38.11535_amd64.deb
sudo dpkg -i intel-opencl_18.38.11535_amd64.deb
Verify your OpenCL installation with clinfo
sudo apt-get update
sudo apt-get install clinfo
clinfo
The output should list at least one platform and one device. The "Extensions" field of the device properties should include cl_khr_spir
and/or cl_khr_il_program
.
Build TensorFlow with SYCL
Install dependency packages
sudo apt-get update
sudo apt-get install git cmake gcc build-essential libpython-all-dev opencl-headers openjdk-8-jdk python python-dev python-pip zlib1g-dev
pip install --user numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6
Specific python package versions are added here for reference. Version 1.14.5 of numpy is required, as newer versions are known to break.
Install toolchains
- Register for an account on Codeplay's developer website: https://developer.codeplay.com/computecppce/latest/download
- From that page, download the following version: Ubuntu 16.04 > 64bit > computecpp-ce-1.1.2-ubuntu.16.04-64bit.tar.gz
tar -xf ComputeCpp-CE-1.1.2-Ubuntu.16.04-64bit.tar.gz
sudo mv ComputeCpp-CE-1.1.2-Ubuntu-16.04-x86_64 /usr/local/computecpp
export COMPUTECPP_TOOLKIT_PATH=/usr/local/computecpp
export LD_LIBRARY_PATH+=:/usr/local/computecpp/lib
/usr/local/computecpp/bin/computecpp_info
The computecpp_info
tool should now list your supported devices similar to the below message.
********************************************************************************
ComputeCpp Info (CE 1.1.2)
********************************************************************************
Toolchain information:
GLIBCXX: 20150426
This version of libstdc++ is supported.
********************************************************************************
Device Info:
Discovered 1 devices matching:
platform : <any>
device type : <any>
--------------------------------------------------------------------------------
Device 0:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Fiji
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2482.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
********************************************************************************
Notes:
- If you see "error while loading shared libraries: libOpenCL.so" then you have not installed the OpenCL drivers needed to run ComputeCpp.
- If the device is listed as "untested" by the tool, Codeplay does not test or support that specific device but it would still be expected to work correctly.
Install Bazel
wget https://github.com/bazelbuild/bazel/releases/download/0.16.0/bazel_0.16.0-linux-x86_64.deb
sudo apt install -y ./bazel_0.16.0-linux-x86_64.deb
bazel version
Check that the bazel version output from the above command is 0.16.0. More recent versions may work but are not supported.
Build TensorFlow
git clone http://github.com/codeplaysoftware/tensorflow
cd tensorflow
The configure step is controlled by environment variables. You can leave them unset to instead get prompted questions. Note that the variables are read once when ./configure
is executed and the result is written in .tf_configure.bazelrc
. Make sure to re-run ./configure
if you wish to change these variables.
export PYTHON_BIN_PATH=/usr/bin/python
export USE_DEFAULT_PYTHON_LIB_PATH=1
export CC_OPT_FLAGS=\"\"
export TF_NEED_JEMALLOC=1
export TF_NEED_MKL=0
export TF_NEED_GCP=0
export TF_NEED_HDFS=0
export TF_ENABLE_XLA=0
export TF_NEED_CUDA=0
export TF_NEED_VERBS=0
export TF_NEED_MPI=0
export TF_NEED_GDR=0
export TF_NEED_AWS=0
export TF_NEED_S3=0
export TF_NEED_KAFKA=0
export TF_DOWNLOAD_CLANG=0
export TF_SET_ANDROID_WORKSPACE=0
export TF_NEED_OPENCL_SYCL=1
export TF_NEED_COMPUTECPP=1
The following environment variables may differ depending on your device:
export TF_USE_DOUBLE_SYCL=1
Set this to 0 if double-precision floating-point operations are not needed, or are not supported by your device. You can verify this by checking for cl_khr_fp64
in the "Extensions" field of the device properties output by the clinfo command.
export TF_USE_HALF_SYCL=0
Half-precision floating-point operations are not supported yet.
export TF_SYCL_USE_LOCAL_MEM=1
This option is not prompted during the configure step. If it is unset, some operations will generate two sets of kernels - using local memory and not using local memory - and the right one will be selected at runtime. It is advised to set this value to "1" if your device supports it. Set it to "0" if the "Local memory type" field of clinfo is not "Local" or if "Local memory size" is smaller than 4KiB.
export TF_SYCL_USE_SERIAL_MEMOP=0
This option is not prompted during the configure step. It is recommended to set it to "0" for better performance. Not all devices support this option. If all the operations using the GPU throw an exception, please set this to "1".
export TF_SYCL_BITCODE_TARGET=spir64
- The possible values for this option are
spir32
,spir64
,spirv32
,spirv64
orptx64
depending on which intermediate language your OpenCL library supports: ptx64
is for Nvidia GPUs.- On other platforms, check the device properties output by the clinfo command:
- In the "Extensions" field, if
cl_khr_spir
is present, usespirXX
, or ifcl_khr_il_program
is present, usespirvXX
. - Substitute "XX" above for the value of the "Address bits" field. Note that issues can arise if the device's "Address bits" value does not match that of the host CPU e.g. a 64-bit CPU and 32-bit GPU.
- In the "Extensions" field, if
./configure
Build the pip package with:
bazel build --verbose_failures -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package
Bundle and install the wheel
bazel-bin/tensorflow/tools/pip_package/build_pip_package <path/to/output/folder>
pip install --user <path/to/output/folder>/tensorflow-1.9.0-cp27-cp27mu-linux_x86_64.whl
Many tests and benchmarks require more pip packages than the minimal set of packages listed in the pre-requistes. The versions listed below are known to work with this build of TensorFlow:
pip install -U --user numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6 portpicker==1.2.0
pip install -U --user scipy==1.1.1
pip install -U --user scikit-learn==0.20.2
pip install -U --user --no-deps sklearn
Run the benchmarks
To verify the installation, you can execute some of the standard TensorFlow benchmarks. The example below shows how to run AlexNet:
git clone http://github.com/tensorflow/benchmarks
cd benchmarks
git checkout f5d85aef2851881001130b28385795bc4c59fa38
python scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --num_batches=10 --local_parameter_device=sycl --device=sycl --batch_size=1 --forward_only=true --model=alexnet --data_format=NHWC
Setting a higher batch_size
will increase the GPU usage and give better inference/s but this is not always possible in real world applications.
You may see warnings about deprecated functions, but they can be safely ignored.
Run the tests
Running the tests is a good way to check which operations are currently supported with a particular device. You can do so with the command:
bazel test --test_lang_filters=cc,py --test_timeout 1500 --verbose_failures -c opt --config=sycl -- //tensorflow/... -//tensorflow/compiler/... -//tensorflow/contrib/distributions/... -//tensorflow/contrib/lite/... -//tensorflow/contrib/session_bundle/... -//tensorflow/contrib/slim/... -//tensorflow/contrib/verbs/... -//tensorflow/core/distributed_runtime/... -//tensorflow/core/kernels/hexagon/... -//tensorflow/go/... -//tensorflow/java/... -//tensorflow/python/debug/... -//tensorflow/stream_executor/...
Make sure to use the same options as during the build step. The command above will run about 1500 tests. The sets starting with "-" are removed as they are either not related to SYCL or not supported.