Wednesday, August 22, 2018

Building tensorflow 1.10 with cuda 9.2 support on ubuntu 16.04

From version 1.7, tensorflow binary available from anaconda repository, is build with AVX support. To run tensorflow on old cpu missing AVX instructions set, such Xeon E5520, tensorflow must be build from source.

Tensorflow can be build on ubuntu 18.04. Here tensorflow 1.10 will be build for ubuntu 16.04 with CUDA 9.2/cuDNN 7.14 support. Building tensorflow from source relies on the installation of several softwares. Once installed, one can try to run the configure script:

./configure

After some trials, the following configuration seems to succeed for building with CUDA support. From a terminal launch the configure script (here in a virtual environment):


$ ./configure
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.15.0 installed.
Please specify the location of python. [Default is /home/jeanpat/anaconda3/envs/DeepFish/bin/python]:



Found possible Python library paths:
  /home/jeanpat/anaconda3/envs/DeepFish/lib/python3.6/site-packages
Please input the desired Python library path to use.  Default is [/home/jeanpat/anaconda3/envs/DeepFish/lib/python3.6/site-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon AWS Platform support? [Y/n]: n
No Amazon AWS Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: y
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: y
GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: y
VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 9.2


Please specify the location where CUDA 9.2 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.14


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Do you wish to build TensorFlow with TensorRT support? [y/N]: y
TensorRT support will be enabled for TensorFlow.

Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]:


Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 1.3.5


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 5.2]:


Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:


Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
    --config=mkl             # Build with MKL support.
    --config=monolithic      # Config for mostly static monolithic build.
Configuration finished

Building with bazel failed:

According to documentation, the building step was started by typing in a terminal:

$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
 
Unfortunately, after some hours, the build failed with the error:


./tensorflow/core/util/tensor_format.h:420:3: note: in expansion of macro 'CHECK'
   CHECK(index >= 0 && index < dimension_attributes.size())
   ^
./tensorflow/core/util/tensor_format.h: In instantiation of 'T tensorflow::GetFilterDim(tensorflow::gtl::ArraySlice<T>, tensorflow::FilterTensorFormat, char) [with T = long long int]':
./tensorflow/core/util/tensor_format.h:461:54:   required from here
./tensorflow/core/util/tensor_format.h:435:29: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   CHECK(index >= 0 && index < dimension_attribute.size())
                             ^
./tensorflow/core/platform/macros.h:87:47: note: in definition of macro 'TF_PREDICT_FALSE'
 #define TF_PREDICT_FALSE(x) (__builtin_expect(x, 0))
                                               ^
./tensorflow/core/util/tensor_format.h:435:3: note: in expansion of macro 'CHECK'
   CHECK(index >= 0 && index < dimension_attribute.size())
   ^
ERROR: /home/jeanpat/Developpement/Arch-TensorFlow/tensorflow/tensorflow/BUILD:576:1: Executing genrule //tensorflow:tensorflow_python_api_gen failed (Aborted): bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped)
2018-08-22 21:44:49.248926: F tensorflow/core/framework/allocator_registry.cc:52] New registration for AllocatorFactory with name=BFCRdmaAllocator priority=101 at location tensorflow/contrib/gdr/gdr_memory_manager.cc:204 conflicts with previous registration at location tensorflow/contrib/verbs/rdma_mgr.cc:277
/bin/bash: line 1: 17921 Aborted 
                (core dumped) bazel-out/host/bin/tensorflow/create_tensorflow.python_api --root_init_template=tensorflow/api_template.__init__.py --apidir=bazel-out/host/genfiles/tensorflow --apiname=tensorflow --apiversion=1 --package=tensorflow.python --output_package=tensorflow bazel-out/host/genfiles/tensorflow/__init__.py bazel-out/host/genfiles/tensorflow/app/__init__.py bazel-out/host/genfiles/tensorflow/bitwise/__init__.py bazel-out/host/genfiles/tensorflow/compat/__init__.py bazel-out/host/genfiles/tensorflow/data/__init__.py bazel-out/host/genfiles/tensorflow/debugging/__init__.py bazel-out/host/genfiles/tensorflow/distributions/__init__.py bazel-out/host/genfiles/tensorflow/dtypes/__init__.py bazel-out/host/genfiles/tensorflow/errors/__init__.py bazel-out/host/genfiles/tensorflow/feature_column/__init__.py bazel-out/host/genfiles/tensorflow/gfile/__init__.py bazel-out/host/genfiles/tensorflow/graph_util/__init__.py bazel-out/host/genfiles/tensorflow/image/__init__.py bazel-out/host/genfiles/tensorflow/io/__init__.py bazel-out/host/genfiles/tensorflow/initializers/__init__.py bazel-out/host/genfiles/tensorflow/keras/__init__.py bazel-out/host/genfiles/tensorflow/keras/activations/__init__.py bazel-out/host/genfiles/tensorflow/keras/applications/__init__.py bazel-out/host/genfiles/tensorflow/keras/applications/densenet/__init__.py bazel-out/host/genfiles/tensorflow/keras/applications/inception_resnet_v2/__init__.py bazel-out/host/genfiles/tensorflow/keras/applications/inception_v3/__init__.py bazel-out/host/genfiles/tensorflow/keras/applications/mobilenet/__init__.py bazel-out/host/genfiles/tensorflow/keras/applications/nasnet/__init__.py bazel-out/host/genfiles/tensorflow/keras/applications/resnet50/__init__.py bazel-out/host/genfiles/tensorflow/keras/applications/vgg16/__init__.py bazel-out/host/genfiles/tensorflow/keras/applications/vgg19/__init__.py bazel-out/host/genfiles/tensorflow/keras/applications/xception/__init__.py bazel-out/host/genfiles/tensorflow/keras/backend/__init__.py bazel-out/host/genfiles/tensorflow/keras/callbacks/__init__.py bazel-out/host/genfiles/tensorflow/keras/constraints/__init__.py bazel-out/host/genfiles/tensorflow/keras/datasets/__init__.py bazel-out/host/genfiles/tensorflow/keras/datasets/boston_housing/__init__.py bazel-out/host/genfiles/tensorflow/keras/datasets/cifar10/__init__.py bazel-out/host/genfiles/tensorflow/keras/datasets/cifar100/__init__.py bazel-out/host/genfiles/tensorflow/keras/datasets/fashion_mnist/__init__.py bazel-out/host/genfiles/tensorflow/keras/datasets/imdb/__init__.py bazel-out/host/genfiles/tensorflow/keras/datasets/mnist/__init__.py bazel-out/host/genfiles/tensorflow/keras/datasets/reuters/__init__.py bazel-out/host/genfiles/tensorflow/keras/estimator/__init__.py bazel-out/host/genfiles/tensorflow/keras/initializers/__init__.py bazel-out/host/genfiles/tensorflow/keras/layers/__init__.py bazel-out/host/genfiles/tensorflow/keras/losses/__init__.py bazel-out/host/genfiles/tensorflow/keras/metrics/__init__.py bazel-out/host/genfiles/tensorflow/keras/models/__init__.py bazel-out/host/genfiles/tensorflow/keras/optimizers/__init__.py bazel-out/host/genfiles/tensorflow/keras/preprocessing/__init__.py bazel-out/host/genfiles/tensorflow/keras/preprocessing/image/__init__.py bazel-out/host/genfiles/tensorflow/keras/preprocessing/sequence/__init__.py bazel-out/host/genfiles/tensorflow/keras/preprocessing/text/__init__.py bazel-out/host/genfiles/tensorflow/keras/regularizers/__init__.py bazel-out/host/genfiles/tensorflow/keras/utils/__init__.py bazel-out/host/genfiles/tensorflow/keras/wrappers/__init__.py bazel-out/host/genfiles/tensorflow/keras/wrappers/scikit_learn/__init__.py bazel-out/host/genfiles/tensorflow/layers/__init__.py bazel-out/host/genfiles/tensorflow/linalg/__init__.py bazel-out/host/genfiles/tensorflow/logging/__init__.py bazel-out/host/genfiles/tensorflow/losses/__init__.py bazel-out/host/genfiles/tensorflow/manip/__init__.py bazel-out/host/genfiles/tensorflow/math/__init__.py bazel-out/host/genfiles/tensorflow/metrics/__init__.py bazel-out/host/genfiles/tensorflow/nn/__init__.py bazel-out/host/genfiles/tensorflow/nn/rnn_cell/__init__.py bazel-out/host/genfiles/tensorflow/profiler/__init__.py bazel-out/host/genfiles/tensorflow/python_io/__init__.py bazel-out/host/genfiles/tensorflow/quantization/__init__.py bazel-out/host/genfiles/tensorflow/resource_loader/__init__.py bazel-out/host/genfiles/tensorflow/strings/__init__.py bazel-out/host/genfiles/tensorflow/saved_model/__init__.py bazel-out/host/genfiles/tensorflow/saved_model/builder/__init__.py bazel-out/host/genfiles/tensorflow/saved_model/constants/__init__.py bazel-out/host/genfiles/tensorflow/saved_model/loader/__init__.py bazel-out/host/genfiles/tensorflow/saved_model/main_op/__init__.py bazel-out/host/genfiles/tensorflow/saved_model/signature_constants/__init__.py bazel-out/host/genfiles/tensorflow/saved_model/signature_def_utils/__init__.py bazel-out/host/genfiles/tensorflow/saved_model/tag_constants/__init__.py bazel-out/host/genfiles/tensorflow/saved_model/utils/__init__.py bazel-out/host/genfiles/tensorflow/sets/__init__.py bazel-out/host/genfiles/tensorflow/sparse/__init__.py bazel-out/host/genfiles/tensorflow/spectral/__init__.py bazel-out/host/genfiles/tensorflow/summary/__init__.py bazel-out/host/genfiles/tensorflow/sysconfig/__init__.py bazel-out/host/genfiles/tensorflow/test/__init__.py bazel-out/host/genfiles/tensorflow/train/__init__.py bazel-out/host/genfiles/tensorflow/train/queue_runner/__init__.py bazel-out/host/genfiles/tensorflow/user_ops/__init__.py
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 7962.287s, Critical Path: 283.59s
INFO: 7346 processes: 7346 local.
FAILED: Build did NOT complete successfully


What to do? Trying to rebuild with a minimalist configuration (keeping cuda acceleration)

./configure again

This  time with less optimizations (keep XLA and CUDA):

$ ./configure
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.15.0 installed.
Please specify the location of python. [Default is /home/jeanpat/anaconda3/envs/DeepFish/bin/python]:



Found possible Python library paths:
  /home/jeanpat/anaconda3/envs/DeepFish/lib/python3.6/site-packages
Please input the desired Python library path to use.  Default is [/home/jeanpat/anaconda3/envs/DeepFish/lib/python3.6/site-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon AWS Platform support? [Y/n]: n
No Amazon AWS Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: y
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: n
GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 9.2

Please specify the location where CUDA 9.2 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.14

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Do you wish to build TensorFlow with TensorRT support? [y/N]: n
TensorRT support will be enabled for TensorFlow.

Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]:

Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 1.3.5

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 5.2]:

Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:

Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
    --config=mkl             # Build with MKL support.
    --config=monolithic      # Config for mostly static monolithic build.
Configuration finished

Build

Build first failed but installing additional keras elements allowed finally to succeed

build package

pip install /tmp/tensorflow_pkg/tensorflow*.whl

Check tensorflow

From a terminal, call an ipython console (here in DeepFish virtualenv) and type:

(DeepFish) jeanpat@Dell-T5500:~/Developpement$ ipython
Python 3.6.5 | packaged by conda-forge | (default, Apr  6 2018, 13:39:56)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help.


In [1]: import tensorflow as tf

In [2]: tf.__version__
Out[2]: '1.10.0'
In [3]: hello = tf.constant('Hello, Tensorflow!')

In [4]: sess = tf.Session()
2018-08-23 15:37:45.588128: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-08-23 15:37:45.588830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Found device 0 with properties:
name: GeForce GTX 960 major: 5 minor: 2 memoryClockRate(GHz): 1.253
pciBusID: 0000:03:00.0
totalMemory: 3.95GiB freeMemory: 3.68GiB
2018-08-23 15:37:45.588860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1485] Adding visible gpu devices: 0
2018-08-23 15:37:45.997955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:966] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-23 15:37:45.998007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:972]      0
2018-08-23 15:37:45.998035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:985] 0:   N
2018-08-23 15:37:45.998306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3402 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960, pci bus id: 0000:03:00.0, compute capability: 5.2)


In [5]: print(sess.run(hello))
b'Hello, Tensorflow!'

No comments: