# 一篇文章掌握TensorFlow深度学习

## ensorFlow深度学习框架

TensorFlow不仅在Github开放了源代码，在《TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems》论文中也介绍了系统框架的设计与实现，其中测试过200节点规模的训练集群也是其他分布式深度学习框架所不能媲美的。Google还在《Wide & Deep Learning for Recommender Systems》和《The YouTube Video Recommendation System》论文中介绍了Google Play应用商店和YouTube视频推荐的算法模型，还提供了基于TensorFlow的代码实例，使用TensorFlow任何人都可以在ImageNet或Kaggle竞赛中得到接近State of the art的好成绩。

## TensorFlow从入门到应用

```# Import the library
import tensorflow as tf

# Define the graph
hello_op = tf.constant('Hello, TensorFlow!')
a = tf.constant(10)
b = tf.constant(32)
compute_op = tf.add(a, b)

# Define the session to run graph
with tf.Session() as sess:
print(sess.run(hello_op))
print(sess.run(compute_op))```

```import tensorflow as tf
import numpy as np

# Prepare train data
train_X = np.linspace(-1, 1, 100)
train_Y = 2 * train_X + np.random.randn(*train_X.shape) * 0.33 + 10

# Define the model
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
w = tf.Variable(0.0, name="weight")
b = tf.Variable(0.0, name="bias")
loss = tf.square(Y - tf.matmul(X, w) - b)

# Create session to run
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
epoch = 1
for i in range(10):
for (x, y) in zip(train_X, train_Y):
_, w_value, b_value = sess.run([train_op, w, b],
feed_dict={X: x, Y: y})
print("Epoch: {}， w: {}, b: {}".format(epoch, w_value, b_value))
epoch += 1```

```Epoch: 1， w: -0.909195065498352, b: 9.612462043762207
Epoch: 2， w: 0.296161413192749, b: 10.418954849243164
Epoch: 3， w: 1.108984351158142, b: 10.283171653747559
Epoch: 4， w: 1.5482335090637207, b: 10.143315315246582
Epoch: 5， w: 1.7749555110931396, b: 10.063009262084961
Epoch: 6， w: 1.8906776905059814, b: 10.020986557006836
Epoch: 7， w: 1.9495772123336792, b: 9.999467849731445
Epoch: 8， w: 1.9795364141464233, b: 9.988500595092773
Epoch: 9， w: 1.994771122932434, b: 9.982922554016113
Epoch: 10， w: 2.0025179386138916, b: 9.980087280273438```

## TensorFlow核心使用技巧

#### 1. 准备训练数据

```def generate_tfrecords(input_filename, output_filename):
print("Start to convert {} to {}".format(input_filename, output_filename))
writer = tf.python_io.TFRecordWriter(output_filename)

for line in open(input_filename, "r"):
data = line.split(",")
label = float(data[9])
features = [float(i) for i in data[:9]]

example = tf.train.Example(features=tf.train.Features(feature={
"label":
tf.train.Feature(float_list=tf.train.FloatList(value=[label])),
"features":
tf.train.Feature(float_list=tf.train.FloatList(value=features)),
}))
writer.write(example.SerializeToString())

writer.close()
print("Successfully convert {} to {}".format(input_filename,
output_filename))```

#### 2. 接受命令行参数

TensorFlow底层使用了python-gflags项目，然后封装成tf.app.flags接口，使用起来非常简单和直观，在实际项目中一般会提前定义命令行参数，尤其在后面将会提到的Cloud Machine Learning服务中，通过参数来简化Hyperparameter的调优。

```# Define hyperparameters
flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_boolean("enable_colored_log", False, "Enable colored log")
flags.DEFINE_string("train_tfrecords_file",
"./data/a8a/a8a_train.libsvm.tfrecords",
"The glob pattern of train TFRecords files")
flags.DEFINE_string("validate_tfrecords_file",
"./data/a8a/a8a_test.libsvm.tfrecords",
"The glob pattern of validate TFRecords files")
flags.DEFINE_integer("feature_size", 124, "Number of feature size")
flags.DEFINE_integer("label_size", 2, "Number of label size")
flags.DEFINE_float("learning_rate", 0.01, "The learning rate")
flags.DEFINE_integer("epoch_number", 10, "Number of epochs to train")
flags.DEFINE_integer("batch_size", 1024, "The batch size of training")
flags.DEFINE_integer("validate_batch_size", 1024,
"The batch size of validation")
flags.DEFINE_integer("min_after_dequeue", 100,
"The minimal number after dequeue")
flags.DEFINE_string("checkpoint_path", "./sparse_checkpoint/",
"The path of checkpoint")
flags.DEFINE_string("output_path", "./sparse_tensorboard/",
"The path of tensorboard event files")
flags.DEFINE_string("model", "dnn", "Support dnn, lr, wide_and_deep")
flags.DEFINE_string("model_network", "128 32 8", "The neural network of model")
flags.DEFINE_boolean("enable_bn", False, "Enable batch normalization or not")
flags.DEFINE_float("bn_epsilon", 0.001, "The epsilon of batch normalization")
flags.DEFINE_boolean("enable_dropout", False, "Enable dropout or not")
flags.DEFINE_float("dropout_keep_prob", 0.5, "The dropout keep prob")
flags.DEFINE_boolean("enable_lr_decay", False, "Enable learning rate decay")
flags.DEFINE_float("lr_decay_rate", 0.96, "Learning rate decay rate")
flags.DEFINE_integer("steps_to_validate", 10,
"Steps to validate and print state")
flags.DEFINE_string("mode", "train", "Support train, export, inference")
flags.DEFINE_string("saved_model_path", "./sparse_saved_model/",
"The path of the saved model")
flags.DEFINE_string("model_path", "./sparse_model/", "The path of the model")
flags.DEFINE_integer("model_version", 1, "The version of the model")
flags.DEFINE_string("inference_test_file", "./data/a8a_test.libsvm",
"The test file for inference")
flags.DEFINE_string("inference_result_file", "./inference_result.txt",
"The result file from inference")```

#### 3. 定义神经网络模型

```# Define the model
input_units = FEATURE_SIZE
hidden1_units = 10
hidden2_units = 10
hidden3_units = 10
hidden4_units = 10
output_units = LABEL_SIZE

def full_connect(inputs, weights_shape, biases_shape):
with tf.device('/gpu:0'):
weights = tf.get_variable("weights", weights_shape,
initializer=tf.random_normal_initializer())
biases = tf.get_variable("biases", biases_shape,
initializer=tf.random_normal_initializer())
return tf.matmul(inputs, weights) + biases

def full_connect_relu(inputs, weights_shape, biases_shape):
return tf.nn.relu(full_connect(inputs, weights_shape, biases_shape))

def deep_inference(inputs):
with tf.variable_scope("layer1"):
layer = full_connect_relu(inputs, [input_units, hidden1_units],
[hidden1_units])
with tf.variable_scope("layer2"):
layer = full_connect_relu(inputs, [hidden1_units, hidden2_units],
[hidden2_units])
with tf.variable_scope("layer3"):
layer = full_connect_relu(inputs, [hidden2_units, hidden3_units],
[hidden3_units])
with tf.variable_scope("layer4"):
layer = full_connect_relu(inputs, [hidden3_units, hidden4_units],
[hidden4_units])
with tf.variable_op_scope("output"):
layer = full_connect_relu(inputs, [hidden4_units, output_units],
[output_units])
return layer```

#### 5. Online learning与Continuous learning

```# Create Session to run graph
with tf.Session() as sess:
summary_op = tf.merge_all_summaries()
write = tf.train.SummaryWriter(tensorboard_dir, sess.graph)
sess.run(init_op)
sess.run(tf.initialize_local_variables())

if mode == "train" or mode == "train_from_scratch":
if mode != "train_from_scratch":
ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
print("Continue training from the model {}".format(ckpt.model_checkpoint_path))
saver.restore(sess, ckpt.model_checkpoint_path)```

#### 6. 使用TensorFlow优化参数

TensorFlow还集成了一个功能强大的图形化工具，也即是TensorBoard，一般只需要在代码中加入我们关心的训练指标，TensorBoard就会自动根据这些参数绘图，通过可视化的方式来了解模型训练的情况。

```tf.scalar_summary(‘loss’, loss)
tf.scalar_summary(‘accuracy’, accuracy)
tf.scalar_summary(‘auc’, auc_op)```

7. 分布式TensorFlow应用

```cancer_classifier.py --ps_hosts=127.0.0.1:2222,127.0.0.1:2223 --worker_hosts=127.0.0.1:2224,127.0.0.1:2225 --job_name=ps --task_index=0

cancer_classifier.py --ps_hosts=127.0.0.1:2222,127.0.0.1:2223 --worker_hosts=127.0.0.1:2224,127.0.0.1:2225 --job_name=ps --task_index=1

cancer_classifier.py --ps_hosts=127.0.0.1:2222,127.0.0.1:2223 --worker_hosts=127.0.0.1:2224,127.0.0.1:2225 --job_name=worker --task_index=0

cancer_classifier.py --ps_hosts=127.0.0.1:2222,127.0.0.1:2223 --worker_hosts=127.0.0.1:2224,127.0.0.1:2225 --job_name=worker --task_index=1```

#### 8. Cloud Machine Learning

TensorFlow是很好深度学习框架，对于个人开发者、科研人员已经企业都是值得投资的技术方向，而Cloud Machine Learning可以解决用户在环境初始化、训练任务管理以及神经网络模型的在线服务上的管理和调度问题。目前Google Cloud ML已经支持automatically hyperparameter tunning，参数调优未来也将成为计算问题而不是技术问题，即使有的开发者使用MXNet或者其他，而不是TensorFlow，我们也愿意与更多深度学习用户和平台开发者交流，促进社区的发展。

End.