编译安装 TensorFlow

开启加速指令,充分调用起你的硬件

Posted by Xiaosheng on September 19, 2017

0. 前言

最简单无脑的 TensorFlow 安装方法是通过 pip 安装,这也是很多人向新手推荐的安装方法。但是 pip 因为要考虑兼容性,所有不可能针对本地环境做优化。这样在使用时,就会跳出一些警告说你的机器支持一些可加速运算的指令,但编译时没有启用。

2017-06-26 10:34:11.820609: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-26 10:34:11.820621: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-26 10:34:11.820624: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-26 10:34:11.820629: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

你可以选择把头埋进沙子里:

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf

也可以选择花 20 分钟编译一个健全的 TensorFlow。

1

1. 安装 bazel

Ubuntu Linux (16.04, 15.10, and 14.04)

首先安装 JDK 8:

sudo apt-get install openjdk-8-jdk

如果是在 Ubuntu 14.04 LTS 需要使用一个 PPA:

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update && sudo apt-get install oracle-java8-installer

然后,添加 Bazel distribution URL 作为一个源:

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

最后,安装:

sudo apt-get update && sudo apt-get install bazel

安装后,可以更新一下 Bazel:

sudo apt-get upgrade bazel

Mac OS X

首先安装 JDK 8,可以从 Oracle’s JDK Page 下载。

然后安装 Homebrew:

/usr/bin/ruby -e "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/install/master/install)"

最后,使用 Homebrew 安装 Bazel:

brew install bazel

你可以通过 bazel version 检查 Bazel 是否安装成功,可以通过 brew upgrade bazel 更新 Bazel。

2. 配置

从 Github 下载 TensorFlow 的源代码,切换到 tensorflow 目录,启动配置程序:

cd tensorflow
./configure

2.1 CPU 版

CPU 版的配置过程比较简单,直接一路回车自动完成配置即可:

No XLA support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N] 
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N] 
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] 
No CUDA support will be enabled for TensorFlow
.........
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
Configuration finished

2.2 GPU 版

如果你的机器上装有 GPU,那么就需要安装对应的 Tensorflow GPU 版,否则只能使用 CPU 计算。这里需要你已经完成 CUDA 和 Nvidia 驱动的安装,具体过程可以参考《Ubuntu深度学习环境搭建》

前面的 Do you wish to build TensorFlow with jemalloc as malloc support? 这些选择也都可以使用默认值,关键在 Cuda 这步,需要配置:

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 9.0

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 7.0.3

Please specify the location where cuDNN 7.0.3 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]

这些需要根据实际的情况配置,上面是按照 CUDA 9.0、cuDNN 7.0.3 的情况配置的。注意在 Cuda compute capabilities 环节,需要访问 https://developer.nvidia.com/cuda-gpus 根据 Nvidia 硬件型号来选择版本号。

new-1

我使用的是 Tesla P4 GPU,所以我的版本号是 6.1。后续的各项配置,直接一路回车自动完成配置即可。

3. 编译

正式开始编译:

bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package

然后就是漫长的等待了,可以去喝杯咖啡。编译完成后,会看到下面的提示:

Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 1411.419s, Critical Path: 81.12s

如果你需要让 Tensorflow 支持 GPU,那么在编译完成后,还需要执行:

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

编译完成后在临时文件夹中生成了一堆 binary,当前目录下也有软连接,可以直接调脚本生成 wheel 文件:

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

成功得到:

Mon Jun 26 11:42:01 CST 2017 : === Output wheel file is in: /tmp/tensorflow_pkg

4. 安装

接下来安装这个 wheel。如果之前 pip 装过 TensorFlow 的话,最好先卸载掉:

pip uninstall tensorflow
Proceed (y/n)? y
  Successfully uninstalled tensorflow-1.2.0

再安装:

pip install /tmp/tensorflow_pkg/tensorflow-1.2.0-cp27-cp27m-macosx_10_12_x86_64.whl

注:环境不同 wheel 名称会有所不同,根据实际情况填写。

看到下面的提示,安装成功!

Installing collected packages: tensorflow
Successfully installed tensorflow-1.2.0

可以跑个 demo 试试,验证一下安装是否成功。

cd ~
python
Python 2.7.13 (default, Dec 18 2016, 07:03:39) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow built successfully!')
>>> sess = tf.Session()
>>> print(sess.run(hello))
Hello, TensorFlow built successfully!

这说明新编译的TF工作正常。

如果遇到 Failed to load the native TensorFlow runtime,莫慌,不要在编译 TensorFlow 的目录中运行 Python,cd 到其他路径即可。

如果遇到 ImportError: XXX 'GLIBCXX_3.4.21' not found,可能是你系统中 anaconda 自带的 gcc 版本太旧,conda install libgcc 安装一下最新版的 gcc 即可。

这时写代码调用 TensorFlow 再也不会出现警告了。

参考

《从源码编译安装TensorFlow》
《Installing Bazel》
《ubuntu16.04下安装TensorFlow(GPU加速)》