本帖最后由 levycui 于 2018-11-20 15:00 编辑
问题导读:
1、什么是神经网络?
2、什么是神经网络重要的特性?
3、如何实现简单的神经网络?
4、优化算法如何声明为标准梯度下降?
关注最新经典文章,欢迎关注公众号
上一篇:TensorFlow ML cookbook 第五章4、5节 使用地址匹配示例和使用最近邻进行图像识别
神经网络
在本章中,我们将介绍神经网络以及如何在TensorFlow中实现它们。 大多数后续章节将基于神经网络,因此学习如何在TensorFlow中使用它们非常重要。 我们将首先介绍神经网络的基本概念,然后开始使用多层网络。 在上一节中,我们将创建一个学习玩Tic Tac Toe的神经网络。
在本章中,我们将介绍以下内容:
- Implementing操作门
- 使用门和激活功能
- 实现单层神经网络
- 实现不同的层次
- 使用多层网络
- 改进线性模型的预测
- 学习玩Tic Tac Toe
读者可以在线查找本章中的所有代码,网址为https://github.com/nfmcclure/tensorflow_cookbook
介绍
神经网络目前在诸如图像和语音识别,阅读手写,理解文本,图像分割,对话系统,自动驾驶汽车等任务中打破记录。虽然这些上述任务中的一些将在后面的章节中介绍,但重要的是将神经网络作为一种易于实现的机器学习算法引入,以便我们以后可以对其进行扩展。
神经网络的概念已经存在了几十年。然而,它最近才获得计算上的支持,因为我们现在具有训练大型网络的计算能力,因为处理能力,算法效率和数据大小的进步。
神经网络基本上是应用于输入数据矩阵的一系列操作。这些操作通常是加法和乘法的集合,然后是非线性函数的应用。我们已经看到的一个例子是逻辑回归,第3章的最后一节,线性回归。逻辑回归是部分斜率 - 特征乘积之后的应用S形函数的总和,其是非线性的。神经网络通过允许操作和非线性函数的任意组合(包括绝对值,最大值,最小值等的应用)来进一步概括这一点。
神经网络的重要技巧称为“反向传播”。反向传播是一种允许我们根据学习速率和损失函数输出更新模型变量的过程。我们使用反向传播来更新第3章,线性回归和第4章以及支持向量机中的模型变量。
在神经网络中需要注意的另一个重要特征是非线性激活函数。由于大多数神经网络只是加法和乘法运算的组合,因此它们无法对非线性数据集进行建模。为了解决这个问题,我们在神经网络中使用了非线性激活函数。这将允许神经网络适应大多数非线性情况。
重要的是要记住,就像我们迄今为止看到的大多数算法一样,神经网络对我们选择的超参数敏感。在本章中,我们将看到不同学习率,损失函数和优化程序的影响。
有更多资源可用于学习更深入和详细的神经网络。
描述反向传播的开创性论文是Yann LeCun等人的高效BackProp。 PDF位于:http://yann.lecun. com/exdb/publis/pdf/lecun-98b.pdf
CS231,用于视觉识别的卷积神经网络,由斯坦福大学提供,课程资源可在此处获得:http://cs231n.stanford.edu/
CS224d,斯坦福大学的自然语言处理深度学习,课程资源可在此处获取:http://cs224d.stanford.edu/
深度学习,麻省理工学院出版社出版的一本书。 Goodfellow和其他人,2016年。位于:http://www.deeplearningbook.org
Michael Nielsen有一本名为Neural Networks and Deep Learning的在线书籍,位于:http:// neuralnetworksanddeeplearning.com/
对于更实用的方法和神经网络的介绍,Andrej Karpathy撰写了一篇很棒的摘要和JavaScript示例,称为A Hacker的神经网络指南。该文章位于:http:// karpathy.github.io/neuralnets/
另一个总结了一些关于深度学习的好注释的网站被Ian Goodfellow,Yoshua Bengio和Aaron Courville称为初学者深度学习。这个网页可以在这里找到:http:// randomekek.github.io/deep/deeplearning.html
实施操作入门
神经网络最基本的概念之一是称为操作门的操作。在本节中,我们将以乘法运算作为门开始,然后我们将考虑嵌套门操作。
做好准备
我们将实现的第一个操作门看起来像f(x)= a.x.为优化此门,我们将输入声明为变量,将x输入声明为占位符。这意味着TensorFlow将尝试更改a值而不是x值。我们将创建损失函数作为输出和目标值之间的差值,即50。
第二个嵌套操作门将是f(x)= a.x + b。同样,我们将a和b声明为变量,将x声明为占位符。我们再次将输出优化到目标值50。值得注意的是,第二个例子的解决方案并不是唯一的。有许多模型变量组合可以使输出为50.对于神经网络,我们不关心中间模型变量的值,而是更加强调所需的输出。
将操作视为计算图上的操作门。这是一个描述两个例子的图:
图1:本节中的两个操作门示例。
怎么做
要在TensorFlow中实现第一个操作f(x)= a.x并将输出训练为值50,请按照下列步骤操作:
1.我们首先加载TensorFlow并创建一个图形会话:
[mw_shl_code=python,true]import tensorflow as tf
sess = tf.Session() [/mw_shl_code]
2.现在,我们声明我们的模型变量,输入数据和占位符。 我们使输入数据等于值5,因此得到50的乘法因子将是10(即5X10 = 50):
[mw_shl_code=python,true]a = tf.Variable(tf.constant(4.))
x_val = 5.
x_data = tf.placeholder(dtype=tf.float32)[/mw_shl_code]
3.接下来我们将操作添加到计算图中:
[mw_shl_code=python,true]multiplication = tf.mul(a, x_data)[/mw_shl_code]
4.我们将损失函数声明为输出与所需目标值50之间的L2距离:
[mw_shl_code=python,true]loss = tf.square(tf.sub(multiplication, 50.))[/mw_shl_code]
5.现在我们初始化我们的模型变量并将我们的优化算法声明为标准梯度下降:
[mw_shl_code=python,true]init = tf.initialize_all_variables()
sess.run(init)
my_opt = tf.train.GradientDescentOptimizer(0.01)
train_step = my_opt.minimize(loss)
[/mw_shl_code]
6.我们现在可以将模型输出优化到所需的50值。我们通过连续输入5的输入值并返回传播损耗以将模型变量更新为值10来实现此目的:
[mw_shl_code=python,true]print('Optimizing a Multiplication Gate Output to 50.')
for i in range(10):
sess.run(train_step, feed_dict={x_data: x_val})
a_val = sess.run(a)
mult_output = sess.run(multiplication, feed_dict={x_data: x_ val})
print(str(a_val) + ' * ' + str(x_val) + ' = ' + str(mult_output)) [/mw_shl_code]
7.这导致以下输出:
Optimizing a Multiplication Gate Output to 50.
[mw_shl_code=python,true]7.0 * 5.0 = 35.0
8.5 * 5.0 = 42.5
9.25 * 5.0 = 46.25
9.625 * 5.0 = 48.125
9.8125 * 5.0 = 49.0625
9.90625 * 5.0 = 49.5312
9.95312 * 5.0 = 49.7656
9.97656 * 5.0 = 49.8828
9.98828 * 5.0 = 49.9414
9.99414 * 5.0 = 49.9707 [/mw_shl_code]
8.接下来,我们将对两个嵌套的操作f(x)= a.x + b进行相同的操作。
9.我们将以与前面示例完全相同的方式开始,除了现在我们将初始化两个模型变量a和b:
[mw_shl_code=python,true]from tensorflow.python.framework import ops
ops.reset_default_graph()
sess = tf.Session()
a = tf.Variable(tf.constant(1.))
b = tf.Variable(tf.constant(1.))
x_val = 5.
x_data = tf.placeholder(dtype=tf.float32)
two_gate = tf.add(tf.mul(a, x_data), b)
loss = tf.square(tf.sub(two_gate, 50.))
my_opt = tf.train.GradientDescentOptimizer(0.01)
train_step = my_opt.minimize(loss)
init = tf.initialize_all_variables()
sess.run(init) [/mw_shl_code]
10.我们现在优化模型变量以将输出训练到目标值50:
[mw_shl_code=python,true]print('\nOptimizing Two Gate Output to 50.')
for i in range(10):
# Run the train step
sess.run(train_step, feed_dict={x_data: x_val})
# Get the a and b values
a_val, b_val = (sess.run(a), sess.run(b))
# Run the two-gate graph output
two_gate_output = sess.run(two_gate, feed_dict={x_data: x_ val})
print(str(a_val) + ' * ' + str(x_val) + ' + ' + str(b_val) + ' = ' + str(two_gate_output)) [/mw_shl_code]
11.这导致以下输出:
Optimizing Two Gate Output to 50.
[mw_shl_code=python,true]5.4 * 5.0 + 1.88 = 28.88
7.512 * 5.0 + 2.3024 = 39.8624
8.52576 * 5.0 + 2.50515 = 45.134
9.01236 * 5.0 + 2.60247 = 47.6643
9.24593 * 5.0 + 2.64919 = 48.8789
9.35805 * 5.0 + 2.67161 = 49.4619
9.41186 * 5.0 + 2.68237 = 49.7417
9.43769 * 5.0 + 2.68754 = 49.876
9.45009 * 5.0 + 2.69002 = 49.9405
9.45605 * 5.0 + 2.69121 = 49.9714[/mw_shl_code]
这里需要注意的是,第二个例子的解决方案并不是唯一的。 这在神经网络中并不重要,因为所有参数都被调整以减少损失。 这里的最终解决方案将取决于a和b的初始值。 如果这些是随机初始化的,而不是值1,我们会看到每次迭代的模型变量的不同结束值。
这个怎么运作
我们通过TensorFlow的隐式反向传播实现了计算门的优化。 TensorFlow跟踪我们的模型的操作和变量值,并根据我们的优化算法规范和损失函数的输出进行调整。
我们可以继续扩展操作门,同时跟踪哪些输入是变量,哪些输入是数据。 这对于跟踪是很重要的,因为TensorFlow将更改所有变量以最小化损失,而不是数据,它被声明为占位符。
每个训练步骤自动跟踪计算图并自动更新模型变量的隐式能力是TensorFlow的强大功能之一,也是它如此强大的原因之一。
原文:
Neural Networks
In this chapter, we will introduce neural networks and how to implement them in TensorFlow. Most of the subsequent chapters will be based on neural networks, so learning how to use them in TensorFlow is very important. We will start by introducing basic concepts of neural networking and work up to multilayer networks. In the last section, we will create a neural network that learns to play Tic Tac Toe.
In this chapter, we'll cover the following recipes:
- Implementing Operational Gates
- Working with Gates and Activation Functions
- Implementing a One-Layer Neural Network
- Implementing Different Layers
- Using Multilayer Networks
- Improving Predictions of Linear Models
- Learning to Play Tic Tac Toe
The reader can find all the code from this chapter online, at https://github.com/ nfmcclure/tensorflow_cookbook.
Introduction
Neural networks are currently breaking records in tasks such as image and speech recognition, reading handwriting, understanding text, image segmentation, dialog systems, autonomous car driving, and so much more. While some of these aforementioned tasks will be covered in later chapters, it is important to introduce neural networks as an easy-to-implement machine learning algorithm, so that we can expand on it later.
The concept of a neural network has been around for decades. However, it only recently gained traction computationally because we now have the computational power to train large networks because of advances in processing power, algorithm efficiency, and data sizes.
A neural network is basically a sequence of operations applied to a matrix of input data. These operations are usually collections of additions and multiplications followed by applications of non-linear functions. One example that we have already seen is logistic regression, the last section in Chapter 3, Linear Regression. Logistic regression is the sum of the partial slope-feature products followed by the application of the sigmoid function, which is non-linear. Neural networks generalize this a bit more by allowing any combination of operations and non-linear functions, which includes the applications of absolute value, maximum, minimum, and so on.
The important trick with neural networks is called 'backpropagation'. Back propagation is a procedure that allows us to update the model variables based on the learning rate and the output of the loss function. We used back propagation to update our model variables in the Chapter 3, Linear Regression and Chapter 4, and the Support Vector Machine.
Another important feature to take note of in neural networks is the non-linear activation function. Since most neural networks are just combinations of addition and multiplication operations, they will not be able to model non-linear datasets. To address this issue, we have used the non-linear activation functions in the neural networks. This will allow the neural network to adapt to most non-linear situations.
It is important to remember that, like most of the algorithms we have seen so far, neural networks are sensitive to the hyper-parameters that we choose. In this chapter, we will see the impact of different learning rates, loss functions, and optimization procedures.
There are more resources for learning about neural networks that are more in-depth and detailed.
The seminal paper describing back propagation is Efficient BackProp by Yann LeCun and others. The PDF is located here: http://yann.lecun. com/exdb/publis/pdf/lecun-98b.pdf.
CS231, Convolutional Neural Networks for Visual Recognition, by Stanford University, class resources available here: http://cs231n.stanford. edu/.
CS224d, Deep Learning for Natural Language Processing, by Stanford University, class resources available here: http://cs224d.stanford. edu/.
Deep Learning, a book by the MIT Press. Goodfellow and others, 2016. Located here: http://www.deeplearningbook.org.
There is an online book called Neural Networks and Deep Learning by Michael Nielsen, located here: http:// neuralnetworksanddeeplearning.com/.
For a more pragmatic approach and introduction to neural networks, Andrej Karpathy has written a great summary and JavaScript examples called A Hacker's Guide to Neural Networks. The write-up is located here: http:// karpathy.github.io/neuralnets/.
Another site that summarizes some good notes on deep learning is called Deep Learning for Beginners by Ian Goodfellow, Yoshua Bengio, and Aaron Courville. This web page can be found here: http://randomekek. github.io/deep/deeplearning.html.
Implementing Operational Gates
One of the most fundamental concepts of neural networks is an operation known as an operational gate. In this section, we will start with a multiplication operation as a gate and then we will consider nested gate operations.
Getting ready
The first operational gate we will implement looks like f(x)=a.x. To optimize this gate, we declare the a input as a variable and the x input as a placeholder. This means that TensorFlow will try to change the a value and not the x value. We will create the loss function as the difference between the output and the target value, which is 50.
The second, nested operational gate will be f(x)=a.x+b. Again, we will declare a and b as variables and x as a placeholder. We optimize the output toward the target value of 50 again. The interesting thing to note is that the solution for this second example is not unique. There are many combinations of model variables that will allow the output to be 50. With neural networks, we do not care as much for the values of the intermediate model variables, but place more emphasis on the desired output.
Think of the operations as operational gates on our computational graph. Here is a figure depicting the two examples:
Figure 1: Two operational gate examples in this section.
How to do it…
To implement the first operational f(x)=a.x in TensorFlow and train the output toward the value of 50, follow these steps:
1.We start off by loading TensorFlow and creating a graph session:
import tensorflow as tf
sess = tf.Session()
2.Now, we declare our model variable, input data, and placeholder. We make our input data equal to the value 5, so that the multiplication factor to get 50 will be 10 (that is, 5X10=50):
a = tf.Variable(tf.constant(4.))
x_val = 5.
x_data = tf.placeholder(dtype=tf.float32)
3.Next we add the operation to our computational graph:
multiplication = tf.mul(a, x_data)
4.We will declare the loss function as the L2 distance between the output and the desired target value of 50:
loss = tf.square(tf.sub(multiplication, 50.))
5.Now we initialize our model variable and declare our optimizing algorithm as the standard gradient descent:
init = tf.initialize_all_variables()
sess.run(init)
my_opt = tf.train.GradientDescentOptimizer(0.01)
train_step = my_opt.minimize(loss)
6.We can now optimize our model output towards the desired value of 50. We do this by continually feeding in the input value of 5 and back propagating the loss to update the model variable towards the value of 10:
print('Optimizing a Multiplication Gate Output to 50.')
for i in range(10):
sess.run(train_step, feed_dict={x_data: x_val})
a_val = sess.run(a)
mult_output = sess.run(multiplication, feed_dict={x_data: x_ val})
print(str(a_val) + ' * ' + str(x_val) + ' = ' + str(mult_output))
7.This results in the following output:
Optimizing a Multiplication Gate Output to 50.
7.0 * 5.0 = 35.0
8.5 * 5.0 = 42.5
9.25 * 5.0 = 46.25
9.625 * 5.0 = 48.125
9.8125 * 5.0 = 49.0625
9.90625 * 5.0 = 49.5312
9.95312 * 5.0 = 49.7656
9.97656 * 5.0 = 49.8828
9.98828 * 5.0 = 49.9414
9.99414 * 5.0 = 49.9707
8.Next, we will do the same with a two-nested operations, f(x)=a.x+b.
9.We will start in exactly same way as the preceding example, except now we'll initialize two model variables, a and b:
from tensorflow.python.framework import ops
ops.reset_default_graph()
sess = tf.Session()
a = tf.Variable(tf.constant(1.))
b = tf.Variable(tf.constant(1.))
x_val = 5.
x_data = tf.placeholder(dtype=tf.float32)
two_gate = tf.add(tf.mul(a, x_data), b)
loss = tf.square(tf.sub(two_gate, 50.))
my_opt = tf.train.GradientDescentOptimizer(0.01)
train_step = my_opt.minimize(loss)
init = tf.initialize_all_variables()
sess.run(init)
10.We now optimize the model variables to train the output towards the target value of 50:
print('\nOptimizing Two Gate Output to 50.')
for i in range(10):
# Run the train step
sess.run(train_step, feed_dict={x_data: x_val})
# Get the a and b values
a_val, b_val = (sess.run(a), sess.run(b))
# Run the two-gate graph output
two_gate_output = sess.run(two_gate, feed_dict={x_data: x_ val})
print(str(a_val) + ' * ' + str(x_val) + ' + ' + str(b_val) + ' = ' + str(two_gate_output))
11.This results in the following output:
Optimizing Two Gate Output to 50.
5.4 * 5.0 + 1.88 = 28.88
7.512 * 5.0 + 2.3024 = 39.8624
8.52576 * 5.0 + 2.50515 = 45.134
9.01236 * 5.0 + 2.60247 = 47.6643
9.24593 * 5.0 + 2.64919 = 48.8789
9.35805 * 5.0 + 2.67161 = 49.4619
9.41186 * 5.0 + 2.68237 = 49.7417
9.43769 * 5.0 + 2.68754 = 49.876
9.45009 * 5.0 + 2.69002 = 49.9405
9.45605 * 5.0 + 2.69121 = 49.9714
It is important to note here that the solution to the second example is not unique. This does not matter as much in neural networks, as all parameters are adjusted towards reducing the loss. The final solution here will depend on the initial values of a and b. If these were randomly initialized, instead of to the value of 1, we would see different ending values for the model variables for each iteration.
How it works…
We achieved the optimization of a computational gate via TensorFlow's implicit back propagation. TensorFlow keeps track of our model's operations and variable values and makes adjustments in respect of our optimization algorithm specification and the output of the loss function.
We can keep expanding the operational gates, while keeping track of which inputs are variables and which inputs are data. This is important to keep track of, because TensorFlow will change all variables to minimize the loss, but not the data, which is declared as placeholders.
The implicit ability to keep track of the computational graph and update the model variables automatically with every training step is one of the great features of TensorFlow and what makes it so powerful.
|
|