问题导读:
1、Python脚本如何下载CIFAR-10图像数据并自动分离?
2、如何使用bazel训练模型?
3、什么是Stylenet / Neural-Style?
4、如何在TensorFlow中重新创建网络?
上一篇:TensorFlow ML cookbook 第八章2节 实施高级CNN
重新培训现有的CNN模型
从头开始训练新的图像识别需要大量的时间和计算能力。如果我们可以采用先前训练过的网络并使用我们的图像重新训练它,它可以节省我们的计算时间。对于此配方,我们将展示如何使用预先训练的TensorFlow图像识别模型并对其进行微调以处理不同的图像集。
做好准备
从头开始训练新的图像识别需要大量的时间和计算能力。如果我们可以采用先前训练过的网络并使用我们的图像重新训练它,它可以节省我们的计算时间。其思想是从卷积层重用先前模型的权重和结构,并重新训练网络顶部的完全连接层。
TensorFlow已经在现有CNN模型的基础上创建了一个关于培训的教程(参见See also部分的第一个要点)。在本文中,我们将说明如何对CIFAR-10使用相同的方法。我们将采用的CNN网络使用一种非常流行的架构,称为Inception。 Inception CNN模型由Google创建,在许多图像识别基准测试中表现非常出色。有关详细信息,请参阅“另请参阅”部分第二个要点中引用的文章。
我们将介绍的主要Python脚本显示如何下载CIFAR-10图像数据并自动分离,标记和保存图像到每个列车和测试文件夹中的十个类。之后,我们将重申如何在我们的图像上训练网络。
怎么做
1.我们首先加载必要的库,以便下载,解压缩和保存CIFAR-10图像:
[mw_shl_code=python,true]import os
import tarfile
import _pickle as cPickle
import numpy as np
import urllib.request
import scipy.misc [/mw_shl_code]
2.我们现在声明CIFAR-10数据链接并创建我们将存储数据的临时目录。我们还将声明要引用的十个类别以便稍后保存图像:
[mw_shl_code=python,true]cifar_link = 'https://www.cs.toronto.edu/~kriz/cifar-10-python. tar.gz'
data_dir = 'temp'
if not os.path.isdir(data_dir):
os.makedirs(data_dir)
objects = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] [/mw_shl_code]
3.现在我们将下载CIFAR-10 .tar数据文件,并取消tar文件:
[mw_shl_code=python,true]target_file = os.path.join(data_dir, 'cifar-10-python.tar.gz')
if not os.path.isfile(target_file):
print('CIFAR-10 file not found. Downloading CIFAR data (Size = 163MB)')
print('This may take a few minutes, please wait.')
filename, headers = urllib.request.urlretrieve(cifar_link, target_file)
# Extract into memory
tar = tarfile.open(target_file)
tar.extractall(path=data_dir)
tar.close() [/mw_shl_code]
4.我们现在为培训创建必要的文件夹结构。 临时目录将有两个文件夹,train_dir和validation_dir。 在每个文件夹中,我们将为每个类别创建十个子文件夹:
[mw_shl_code=python,true]# Create train image folders
train_folder = 'train_dir'
if not os.path.isdir(os.path.join(data_dir, train_folder)):
for i in range(10):
folder = os.path.join(data_dir, train_folder, objects)
os.makedirs(folder)
# Create test image folders
test_folder = 'validation_dir'
if not os.path.isdir(os.path.join(data_dir, test_folder)):
for i in range(10):
folder = os.path.join(data_dir, test_folder, objects)
os.makedirs(folder) [/mw_shl_code]
5.为了保存图像,我们将创建一个从内存加载它们并将它们存储在图像字典中的函数:
[mw_shl_code=python,true]def load_batch_from_file(file):
file_conn = open(file, 'rb')
image_dictionary = cPickle.load(file_conn, encoding='latin1')
file_conn.close()
return(image_dictionary) [/mw_shl_code]
6.使用上面的字典,我们将使用以下函数将每个文件保存在正确的位置:
[mw_shl_code=python,true]def save_images_from_dict(image_dict, folder='data_dir'):
for ix, label in enumerate(image_dict['labels']):
folder_path = os.path.join(data_dir, folder, objects[label])
filename = image_dict['filenames'][ix]
#Transform image data
image_array = image_dict['data'][ix]
image_array.resize([3, 32, 32])
# Save image
output_location = os.path.join(folder_path, filename)
scipy.misc.imsave(output_location,image_array.transpose()) [/mw_shl_code]
7.使用上述功能,我们可以遍历下载的数据文件并将每个图像保存到正确的位置:
[mw_shl_code=python,true]data_location = os.path.join(data_dir, 'cifar-10-batches-py')
train_names = ['data_batch_' + str(x) for x in range(1,6)]
test_names = ['test_batch']
# Sort train images
for file in train_names:
print('Saving images from file: {}'.format(file))
file_location = os.path.join(data_dir, 'cifar-10-batches-py', file)
image_dict = load_batch_from_file(file_location)
save_images_from_dict(image_dict, folder=train_folder)
# Sort test images
for file in test_names:
print('Saving images from file: {}'.format(file))
file_location = os.path.join(data_dir, 'cifar-10-batches-py', file)
image_dict = load_batch_from_file(file_location)
save_images_from_dict(image_dict, folder=test_folder) [/mw_shl_code]
8.我们脚本的最后一部分创建了标签文件,这是我们需要的最后一条信息。 这个文件让我们将输出解释为标签而不是数字索引:
[mw_shl_code=python,true]cifar_labels_file = os.path.join(data_dir,'cifar10_labels.txt')
print('Writing labels file, {}'.format(cifar_labels_file))
with open(cifar_labels_file, 'w') as labels_file:
for item in objects:
labels_file.write("{}\n".format(item)) [/mw_shl_code]
9.当运行上述脚本时,它将下载图像并将它们分类到TensorFlow再训练教程所期望的正确文件夹结构中。 完成后,我们只需按照教程进行操作即可。 首先我们应该克隆教程库:
[mw_shl_code=shell,true]git clone https://github.com/tensorflow/models/tree/master/ inception/inception [/mw_shl_code]
10.为了使用先前训练的模型,我们必须下载网络权重并将其应用于我们的模型:
[mw_shl_code=shell,true]me@computer:~$ curl -O http://download.tensorflow.org/models/ image/imagenet/inception-v3-2016-03-01.tar.gz
me@computer:~$ tar xzf inception-v3-2016-03-01.tar.gz [/mw_shl_code]
11.现在我们将图像放在正确的文件夹结构中,我们必须将它们变成TFRecords对象。 我们通过运行以下命令来完成此操作:
[mw_shl_code=shell,true]me@computer:~$ python3 data/build_image_data.py
--train_directory="temp/train_dir/"
--validation_directory="temp/validation_dir"
--output_directory="temp/" --labels_file="temp/cifar10_labels.txt" [/mw_shl_code]
12.现在我们将使用bazel训练模型,将参数'设置为true。 该脚本每10代输出一次损失。 我们可以随时终止此过程,模型输出将在temp / training_results文件夹中。 我们可以从此文件夹加载模型以进行评估:
[mw_shl_code=shell,true]me@computer:~$ bazel-bin/inception/flowers_train
--train_dir="temp/training_results" --data_dir="temp/data_dir"
--pretrained_model_checkpoint_path="model.ckpt-157585"
--fine_tune=True --initial_learning_rate=0.001
--input_queue_memory_factor=1 [/mw_shl_code]
13.这应该导致输出类似于以下内容:
2016-09-18 12:16:32.563577: step 1290, loss = 2.02 (1.2 examples/ sec; 26.965 sec/batch)
2016-09-18 12:25:41.316540: step 1300, loss = 2.01 (1.2 examples/ sec; 26.357 sec/batch)
这个怎么运作…
在预先训练的CNN上进行培训的官方TensorFlow教程需要我们从CIFAR-10数据创建的文件夹设置。 然后,我们将数据转换为所需的TFRecords格式,并开始训练模型。 请记住,我们正在微调模型并重新训练顶部的完全连接的层以适合我们的10类数据。
也可以看看
OfficialTensorflowInception-v3教程:https://github.com/tensorflow/models/tree/master/ inception
GooglenetInception-v3文件:https://arxiv.org/abs/1512.00567
应用Stylenet / Neural-Style
一旦我们对CNN进行了图像识别训练,我们就可以将网络本身用于一些有趣的数据和图像处理。 Stylenet是一种尝试从一张图片中学习图像样式并将其应用于第二张图片同时保留第二图像结构(或内容)的过程。如果我们能够找到与图像内容分开的与样式强烈相关的中间CNN节点,则这是可能的。
做好准备
Stylenet是一个过程,它接收两个图像并将一个图像的样式应用于第二个图像的内容。它基于2015年的一篇着名论文,一种艺术风格的神经算法(参见另见章节中的第一个要点)。作者发现了一些CNN的属性,其中存在中间层,这些中间层似乎编码图片的样式,并且一些编码图片的内容。为此,如果我们训练样式图片上的样式图层和原始图像上的内容图层,并反向传播那些计算的损失,我们可以将原始图像更改为更像样式图像。
为了实现这一目标,我们将从名为imagenet-vgg-19的论文中下载推荐的网络。还有一个imagenet-vgg-16网络也可以使用,但是本文推荐使用imagenet-vgg-19。
怎么做
1.首先,我们将以* .mat格式下载预训练网络。 mat格式是一个matlab对象,Python中的scipy包有一个可以读取它的方法。 下载mat对象的链接在这里。 我们将此模型保存在Python脚本所在的同一文件夹中以供参考:
http://www.vlfeat.org/matconvnet ... vgg-verydeep-19.mat
2.我们将通过加载必要的库来启动我们的Python脚本:
[mw_shl_code=python,true]import os
import scipy.misc
import numpy as np
import tensorflow as tf [/mw_shl_code]
3.然后,我们可以开始图形会话并声明两个图像的位置:原始图像和样式图像。 出于我们的目的,我们将使用本书的封面图片作为原始图像; 对于风格形象,我们将使用Vincent van Gough的星夜。 随意使用您想要的任何两张图片。 如果您选择使用这些图片,可以在本书的github网站https://github. com/nfmcclure/tensorflow_cookbook(导航tostyelnet部分):
[mw_shl_code=python,true]sess = tf.Session()
original_image_file = 'temp/book_cover.jpg'
style_image_file = 'temp/starry_night.jpg' [/mw_shl_code]
4.我们将为我们的模型设置一些参数:垫文件的位置,权重,学习速率,代数,以及我们输出中间图像的频率。 对于权重,有助于在原始图像上高度加权样式图像。 应根据所需结果的变化调整这些超参数:
[mw_shl_code=python,true]vgg_path ='imagenet-vgg-verydeep-19.mat'
original_image_weight = 5.0
style_image_weight = 200.0
regularization_weight = 50.0
learning_rate = 0.1
generations = 10000
output_generations = 500[/mw_shl_code]
5.现在我们将使用scipy加载两个图像并更改样式图像以适合原始图像尺寸:[mw_shl_code=python,true]original_image = scipy.misc.imread(original_image_file)
style_image = scipy.misc.imread(style_image_file)
# Get shape of target and make the style image the same
target_shape = original_image.shape
style_image = scipy.misc.imresize(style_image, target_shape[1] / style_image.shape[1])
[/mw_shl_code]
6.从论文中,我们可以按照它们出现的顺序来定义层。 我们将使用作者的命名约定:
[mw_shl_code=python,true]vgg_layers = ['conv1_1', 'relu1_1',
'conv1_2', 'relu1_2', 'pool1',
'conv2_1', 'relu2_1',
'conv2_2', 'relu2_2', 'pool2',
'conv3_1', 'relu3_1',
'conv3_2', 'relu3_2',
'conv3_3', 'relu3_3',
'conv3_4', 'relu3_4', 'pool3',
'conv4_1', 'relu4_1',
'conv4_2', 'relu4_2',
'conv4_3', 'relu4_3',
'conv4_4', 'relu4_4', 'pool4',
'conv5_1', 'relu5_1',
'conv5_2', 'relu5_2',
'conv5_3', 'relu5_3',
'conv5_4', 'relu5_4'] [/mw_shl_code]
7.现在我们将定义一个从mat文件中提取参数的函数:
[mw_shl_code=python,true]def extract_net_info(path_to_params):
vgg_data = scipy.io.loadmat(path_to_params)
normalization_matrix = vgg_data['normalization'][0][0][0]
mat_mean = np.mean(normalization_matrix, axis=(0,1))
network_weights = vgg_data['layers'][0]
return(mat_mean, network_weights) [/mw_shl_code]
8.从加载的权重和图层定义,我们可以使用以下函数在TensorFlow中重新创建网络。 我们将循环遍历每一层,并在适用的情况下为相应的函数分配适当的权重和偏差:
[mw_shl_code=python,true]def vgg_network(network_weights, init_image):
network = {}
image = init_image
for i, layer in enumerate(vgg_layers):
if layer[1] == 'c':
weights, bias = network_weights[0][0][0][0]
weights = np.transpose(weights, (1, 0, 2, 3))
bias = bias.reshape(-1)
conv_layer = tf.nn.conv2d(image, tf.constant(weights), (1, 1, 1, 1), 'SAME')
image = tf.nn.bias_add(conv_layer, bias)
elif layer[1] == 'r':
image = tf.nn.relu(image)
else:
image = tf.nn.max_pool(image, (1, 2, 2, 1), (1, 2, 2, 1), 'SAME')
network[layer] = image
return(network) [/mw_shl_code]
9.本文推荐了一些将中间层分配给原始图像和样式图像的策略。 虽然我们应该为原始图像保留relu4_2,但我们可以为样式图像尝试其他reluX_1图层输出的不同组合:
[mw_shl_code=python,true]original_layer = 'relu4_2'
style_layers = ['relu1_1', 'relu2_1', 'relu3_1', 'relu4_1', 'relu5_1'] [/mw_shl_code]
10.接下来,我们将运行上述函数来获取权重和均值。 我们还将通过向开头添加尺寸为1的尺寸来将图像形状更改为具有四个尺寸。 TensorFlow的图像操作在四个维度上运行,因此我们必须添加批量大小维度:
[mw_shl_code=python,true]normalization_mean, network_weights = extract_net_info(vgg_path)
shape = (1,) + original_image.shape
style_shape = (1,) + style_image.shape
original_features = {}
style_features = {} [/mw_shl_code]
11.接下来,我们声明图像占位符并使用该占位符创建网络:
[mw_shl_code=python,true]image = tf.placeholder('float', shape=shape)
vgg_net = vgg_network(network_weights, image) [/mw_shl_code]
12.我们现在规范化原始图像矩阵并通过网络运行:
[mw_shl_code=python,true]original_minus_mean = original_image - normalization_mean
original_norm = np.array([original_minus_mean])
original_features[original_layer] = sess.run(vgg_net[original_ layer],
feed_dict={image: original_norm})[/mw_shl_code]
13.我们对在步骤9中选择的每个样式层重复相同的过程:
[mw_shl_code=python,true]image = tf.placeholder('float', shape=style_shape)
vgg_net = vgg_network(network_weights, image)
style_minus_mean = style_image - normalization_mean
style_norm = np.array([style_minus_mean])
for layer in style_layers:
layer_output = sess.run(vgg_net[layer], feed_dict={image: style_norm})
layer_output = np.reshape(layer_output, (-1, layer_output. shape[3]))
style_gram_matrix = np.matmul(layer_output.T, layer_output) / layer_output.size
style_features[layer] = style_gram_matrix [/mw_shl_code]
14.为了创建组合图像,我们将从随机噪声开始并通过网络运行:
[mw_shl_code=python,true]initial = tf.random_normal(shape) * 0.05
image = tf.Variable(initial)
vgg_net = vgg_network(network_weights, image)
[/mw_shl_code]
15.我们现在宣布第一次失败,原始图像上的损失。 我们使用来自步骤12的归一化原始图像的输出与指定用于表示来自步骤9的原始内容的层的输出之间的大小归一化的l2-损失:[mw_shl_code=python,true]original_loss = original_image_weight * (2 * tf.nn.l2_loss(vgg_ net[original_layer] - original_features[original_layer]) / original_features[original_layer].size)
[/mw_shl_code]
16.现在我们为每个样式层计算相同类型的损失:
[mw_shl_code=python,true]style_loss = 0
style_losses = []
for style_layer in style_layers:
layer = vgg_net[style_layer]
feats, height, width, channels = [x.value for x in layer.get_ shape()]
size = height * width * channels
features = tf.reshape(layer, (-1, channels))
style_gram_matrix = tf.matmul(tf.transpose(features), features) / size
style_expected = style_features[style_layer]
style_losses.append(2 * tf.nn.l2_loss(style_gram_matrix - style_expected) / style_expected.size)
style_loss += style_image_weight * tf.reduce_sum(style_losses)[/mw_shl_code]
17.第三个损失期限称为总变异损失。 这来自计算总变差。 它类似于总变差去噪,因为真实图像具有非常低的局部变化,而具有高噪声的图像具有高的局部变化。 以下代码中的关键术语是[mw_shl_code=python,true]second_term_numerator,它减去了附近的像素。 高噪声的图像会有很大的差异,我们可以将其视为一种损失函数,以最大限度地减少:
total_var_x = sess.run(tf.reduce_prod(image[:,1:,:,:].get_ shape()))
total_var_y = sess.run(tf.reduce_prod(image[:,:,1:,:].get_ shape()))
first_term = regularization_weight * 2
second_term_numerator = tf.nn.l2_loss(image[:,1:,:,:] - image[:,:shape[1]-1,:,:])
second_term = second_term_numerator / total_var_y
third_term = (tf.nn.l2_loss(image[:,:,1:,:] - image[:,:,:shape[2]-1,:]) / total_var_x)
total_variation_loss = first_term * (second_term + third_term) [/mw_shl_code]
18.我们想要最小化的总损失是合并的原始,风格和总变异损失:
[mw_shl_code=python,true]loss = original_loss + style_loss + total_variation_loss [/mw_shl_code]
19.接下来我们声明我们的优化器和训练步骤并初始化模型中的所有变量。
[mw_shl_code=python,true]optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_step = optimizer.minimize(loss)
sess.run(tf.initialize_all_variables()) [/mw_shl_code]
20.我们现在遍历我们的培训代,并经常打印状态更新并保存临时图像。 我们将保存临时图像,因为根据所选图像,很难确定运行此算法的程度,因为它可能会有所不同。 最好在较大的一代尺寸上犯错,并在临时图像看起来是一个很好的停止点时停止:
[mw_shl_code=python,true]for i in range(generations):
sess.run(train_step)
# Print update and save temporary output
if (i+1) % output_generations == 0:
print('Generation {} out of {}'.format(i + 1, generations))
image_eval = sess.run(image)
best_image_add_mean = image_eval.reshape(shape[1:]) + normalization_mean
output_file = 'temp_output_{}.jpg'.format(i)
scipy.misc.imsave(output_file, best_image_add_mean)[/mw_shl_code]
21.在算法结束时,我们将保存最终输出:
[mw_shl_code=python,true]image_eval = sess.run(image)
best_image_add_mean = image_eval.reshape(shape[1:]) + normalization_mean
output_file = 'final_output.jpg'
scipy.misc.imsave(output_file, best_image_add_mean)[/mw_shl_code]
图6:使用stylenet算法将书籍封面图像与Starry Night相结合。 请注意,可以通过更改脚本开头的权重来使用不同的样式重点。
这个怎么运作
我们首先加载两个图像,然后将预先训练的网络权重和指定的图层加载到原始图像和样式图像。 我们计算了三种损失函数:原始图像丢失,样式丢失和总变差损失。 然后我们训练随机噪声图片,以获得风格图像的样式和原始图像的内容。
也可以看看
艺术风格的神经算法,Gatys, Ecker, Bethge. 2015. https://arxiv. org/abs/1508.06576.
最新经典文章,欢迎关注公众号
原文:
Retraining Existing CNNs models
Training a new image recognition from scratch requires a lot of time and computational power. If we can take a prior trained network and retrain it with our images, it could save us computational time. For this recipe, we will show how to use a pre-trained TensorFlow image recognition model and fine-tune it to work on a different set of images.
Getting ready
Training a new image recognition from scratch requires a lot of time and computational power. If we can take a prior trained network and retrain it with our images, it could save us computational time. The idea is to reuse the weights and structure of a prior model from the convolutional layers and retrain the fully connected layers at the top of the network.
TensorFlow has created a tutorial about training on top of existing CNN models (refer to the first bullet point of the See also section). In this recipe, we will illustrate how to use the same methodology for CIFAR-10. The CNN network we are going to employ uses a very popular architecture called Inception. The Inception CNN model was created by Google and has performed very well in many image recognition benchmarks. For details, see the paper referenced in the second bullet point of See also section.
The main Python script we will cover shows how to download the CIFAR-10 image data and automatically separate, label, and save the images into the ten classes in each of the train and test folders. After that, we will reiterate how to train the network on our images.
How to do it…
1.We'll start by loading the necessary libraries for downloading, unzipping, and saving the CIFAR-10 images:
import os
import tarfile
import _pickle as cPickle
import numpy as np
import urllib.request
import scipy.misc
2.We now declare the CIFAR-10 data link and make the temporary directory we will store the data in. We'll also declare the ten categories to reference for saving the images later on:
cifar_link = 'https://www.cs.toronto.edu/~kriz/cifar-10-python. tar.gz'
data_dir = 'temp'
if not os.path.isdir(data_dir):
os.makedirs(data_dir)
objects = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
3.Now we'll download the CIFAR-10 .tar data file, and un-tar the file:
target_file = os.path.join(data_dir, 'cifar-10-python.tar.gz')
if not os.path.isfile(target_file):
print('CIFAR-10 file not found. Downloading CIFAR data (Size = 163MB)')
print('This may take a few minutes, please wait.')
filename, headers = urllib.request.urlretrieve(cifar_link, target_file)
# Extract into memory
tar = tarfile.open(target_file)
tar.extractall(path=data_dir)
tar.close()
4.We now create the necessary folder structure for training. The temporary directory will have two folders, train_dir and validation_dir. In each of these folders, we will create the ten sub-folders for each category:
# Create train image folders
train_folder = 'train_dir'
if not os.path.isdir(os.path.join(data_dir, train_folder)):
for i in range(10):
folder = os.path.join(data_dir, train_folder, objects)
os.makedirs(folder)
# Create test image folders
test_folder = 'validation_dir'
if not os.path.isdir(os.path.join(data_dir, test_folder)):
for i in range(10):
folder = os.path.join(data_dir, test_folder, objects)
os.makedirs(folder)
5.In order to save the images, we will create a function that will load them from memory and store them in an image dictionary:
def load_batch_from_file(file):
file_conn = open(file, 'rb')
image_dictionary = cPickle.load(file_conn, encoding='latin1')
file_conn.close()
return(image_dictionary)
6.With the above dictionary, we will save each of the files in the correct location with the following function:
def save_images_from_dict(image_dict, folder='data_dir'):
for ix, label in enumerate(image_dict['labels']):
folder_path = os.path.join(data_dir, folder, objects[label])
filename = image_dict['filenames'][ix]
#Transform image data
image_array = image_dict['data'][ix]
image_array.resize([3, 32, 32])
# Save image
output_location = os.path.join(folder_path, filename)
scipy.misc.imsave(output_location,image_array.transpose())
7.With the preceding functions, we can loop through the downloaded data files and save each image to the correct location:
data_location = os.path.join(data_dir, 'cifar-10-batches-py')
train_names = ['data_batch_' + str(x) for x in range(1,6)]
test_names = ['test_batch']
# Sort train images
for file in train_names:
print('Saving images from file: {}'.format(file))
file_location = os.path.join(data_dir, 'cifar-10-batches-py', file)
image_dict = load_batch_from_file(file_location)
save_images_from_dict(image_dict, folder=train_folder)
# Sort test images
for file in test_names:
print('Saving images from file: {}'.format(file))
file_location = os.path.join(data_dir, 'cifar-10-batches-py', file)
image_dict = load_batch_from_file(file_location)
save_images_from_dict(image_dict, folder=test_folder)
8.The last part of our script creates the label file, and this is the last piece of information that we will need. This file will let us interpret the outputs as labels instead of the numerical indices:
cifar_labels_file = os.path.join(data_dir,'cifar10_labels.txt')
print('Writing labels file, {}'.format(cifar_labels_file))
with open(cifar_labels_file, 'w') as labels_file:
for item in objects:
labels_file.write("{}\n".format(item))
9.When the above script is run, it will download the images and sort them into the correct folder structure that the TensorFlow retraining tutorial expects. Once we have done this, we just follow the tutorial accordingly. First we should clone the tutorial repository:
git clone https://github.com/tensorflow/models/tree/master/ inception/inception
10.In order to use a prior trained model, we must download the network weights and apply it to our model:
me@computer:~$ curl -O http://download.tensorflow.org/models/ image/imagenet/inception-v3-2016-03-01.tar.gz
me@computer:~$ tar xzf inception-v3-2016-03-01.tar.gz
11.Now that we have the images in the correct folder structure, we have to turn them into a TFRecords object. We do this by running the following commands:
me@computer:~$ python3 data/build_image_data.py
--train_directory="temp/train_dir/"
--validation_directory="temp/validation_dir"
--output_directory="temp/" --labels_file="temp/cifar10_labels.txt"
12.Now we'll train the model using bazel, setting the parameter ' to true. This script outputs the loss every 10 generations. We can kill this process at any time and the model output will be in the folder temp/training_results. We can load the model from this folder for evaluation:
me@computer:~$ bazel-bin/inception/flowers_train
--train_dir="temp/training_results" --data_dir="temp/data_dir"
--pretrained_model_checkpoint_path="model.ckpt-157585"
--fine_tune=True --initial_learning_rate=0.001
--input_queue_memory_factor=1
13.This should result in output similar to the following:
2016-09-18 12:16:32.563577: step 1290, loss = 2.02 (1.2 examples/ sec; 26.965 sec/batch)
2016-09-18 12:25:41.316540: step 1300, loss = 2.01 (1.2 examples/ sec; 26.357 sec/batch)
How it works…
The official TensorFlow tutorial for training on top of a pre-trained CNN requires a folder setup that we created from the CIFAR-10 data. We then converted the data into the required TFRecords format and started training the model. Remember that we are fine-tuning the model and retraining the fully connected layers at the top to fit our 10-category data.
See also
Official Tensorflow Inception-v3 tutorial: https://github.com/tensorflow/ models/tree/master/inception
Googlenet Inception-v3 paper: https://arxiv.org/abs/1512.00567
Applying Stylenet/Neural-Style
Once we have an image recognition CNN trained, we can use the network itself for some interesting data and image processing. Stylenet is a procedure that attempts to learn an image style from one picture and apply it to a second picture while keeping the second image structure (or content). This may be possible if we can find intermediate CNN nodes that correlate strongly with a style separately from the content of the image.
Getting ready
Stylenet is a procedure that takes two images and applies the style of one image to the content of the second image. It is based on a famous paper in 2015, A Neural Algorithm of Artistic Style (refer to the first bullet point under See also section). The authors found a property of some CNNs where intermediate layers exist that seem to encode the style of a picture and some encode the content of the picture. To this end, if we train the style layers on the style picture and the content layers on the original image, and back-propagate those calculated losses, we can change the original image to be more like the style image.
In order to accomplish this, we will download the recommended network from the paper, called the imagenet-vgg-19. There is also an imagenet-vgg-16 network that works as well, but the paper recommends imagenet-vgg-19.
How to do it…
1.First, we'll download the pretrained network in *.mat format. The mat format is a matlab object, and the scipy package in Python has a method that can read it. The link to download the mat object is here. We save this model in the same folder our Python script is for reference:
http://www.vlfeat.org/matconvnet ... vgg-verydeep-19.mat
2.We'll start our Python script by loading the necessary libraries:
import os
import scipy.misc
import numpy as np
import tensorflow as tf
3.Then we can start a graph session and declare the locations of our two images: the original image and the style image. For our purposes, we will use the cover image of this book for the original image; for the style image, we will use Starry Night by Vincent van Gough. Feel free to use any two pictures you want here. If you choose to use these pictures, they are available on the book's github site, https://github. com/nfmcclure/tensorflow_cookbook (Navigate tostyelnet section):
sess = tf.Session()
original_image_file = 'temp/book_cover.jpg'
style_image_file = 'temp/starry_night.jpg'
4.We'll set some parameters for our model: the location of the mat file, weights, the learning rate, number of generations, and how frequently we should output the intermediate image. For the weights, it helps to highly weight the style image over the original image. These hyperparameters should be tuned for changes in the desired result:
vgg_path ='imagenet-vgg-verydeep-19.mat'
original_image_weight = 5.0
style_image_weight = 200.0
regularization_weight = 50.0
learning_rate = 0.1
generations = 10000
output_generations = 500
5.Now we'll load the two images with scipy and change the style image to fit the original image dimensions:
original_image = scipy.misc.imread(original_image_file)
style_image = scipy.misc.imread(style_image_file)
# Get shape of target and make the style image the same
target_shape = original_image.shape
style_image = scipy.misc.imresize(style_image, target_shape[1] / style_image.shape[1])
6.From the paper, we can define the layers in order of how they appeared. We'll use the author's naming convention:
vgg_layers = ['conv1_1', 'relu1_1',
'conv1_2', 'relu1_2', 'pool1',
'conv2_1', 'relu2_1',
'conv2_2', 'relu2_2', 'pool2',
'conv3_1', 'relu3_1',
'conv3_2', 'relu3_2',
'conv3_3', 'relu3_3',
'conv3_4', 'relu3_4', 'pool3',
'conv4_1', 'relu4_1',
'conv4_2', 'relu4_2',
'conv4_3', 'relu4_3',
'conv4_4', 'relu4_4', 'pool4',
'conv5_1', 'relu5_1',
'conv5_2', 'relu5_2',
'conv5_3', 'relu5_3',
'conv5_4', 'relu5_4']
7.Now we'll define a function that will extract the parameters from the mat file:
def extract_net_info(path_to_params):
vgg_data = scipy.io.loadmat(path_to_params)
normalization_matrix = vgg_data['normalization'][0][0][0]
mat_mean = np.mean(normalization_matrix, axis=(0,1))
network_weights = vgg_data['layers'][0]
return(mat_mean, network_weights)
8.From the loaded weights and the layer definitions, we can recreate the network in TensorFlow with the following function. We'll loop through each layer and assign the corresponding function with appropriate weights and biases, where applicable:
def vgg_network(network_weights, init_image):
network = {}
image = init_image
for i, layer in enumerate(vgg_layers):
if layer[1] == 'c':
weights, bias = network_weights[0][0][0][0]
weights = np.transpose(weights, (1, 0, 2, 3))
bias = bias.reshape(-1)
conv_layer = tf.nn.conv2d(image, tf.constant(weights), (1, 1, 1, 1), 'SAME')
image = tf.nn.bias_add(conv_layer, bias)
elif layer[1] == 'r':
image = tf.nn.relu(image)
else:
image = tf.nn.max_pool(image, (1, 2, 2, 1), (1, 2, 2, 1), 'SAME')
network[layer] = image
return(network)
9.The paper recommends a few strategies of assigning intermediate layers to the original and style images. While we should keep relu4_2 for the original image, we can try different combinations of the other reluX_1 layer outputs for the style image:
original_layer = 'relu4_2'
style_layers = ['relu1_1', 'relu2_1', 'relu3_1', 'relu4_1', 'relu5_1']
10.Next, we'll run the above function to get the weights and mean. We'll also change the image shapes to have four dimensions by adding a dimension of size one to the beginning. TensorFlow's image operations act on four dimensions, so we must add the batch-size dimension:
normalization_mean, network_weights = extract_net_info(vgg_path)
shape = (1,) + original_image.shape
style_shape = (1,) + style_image.shape
original_features = {}
style_features = {}
11.Next, we declare the image placeholder and create the network with that placeholder:
image = tf.placeholder('float', shape=shape)
vgg_net = vgg_network(network_weights, image)
12.We now normalize the original image matrix and run it through the network:
original_minus_mean = original_image - normalization_mean
original_norm = np.array([original_minus_mean])
original_features[original_layer] = sess.run(vgg_net[original_ layer],
feed_dict={image: original_norm})
13.We repeat the same procedure with each of the style layers that we chose in Step 9:
image = tf.placeholder('float', shape=style_shape)
vgg_net = vgg_network(network_weights, image)
style_minus_mean = style_image - normalization_mean
style_norm = np.array([style_minus_mean])
for layer in style_layers:
layer_output = sess.run(vgg_net[layer], feed_dict={image: style_norm})
layer_output = np.reshape(layer_output, (-1, layer_output. shape[3]))
style_gram_matrix = np.matmul(layer_output.T, layer_output) / layer_output.size
style_features[layer] = style_gram_matrix
14.In order to create the combined image, we'll start with random noise and run it through the network:
initial = tf.random_normal(shape) * 0.05
image = tf.Variable(initial)
vgg_net = vgg_network(network_weights, image)
15.We now declare the first loss, the loss on the original image. We use the size-normalized l2-loss between the output of the normalized original image from step 12 and the output of the layer designated to represent the original content from step 9:
original_loss = original_image_weight * (2 * tf.nn.l2_loss(vgg_ net[original_layer] - original_features[original_layer]) / original_features[original_layer].size)
16.Now we calculate the same type of loss for each style layer:
style_loss = 0
style_losses = []
for style_layer in style_layers:
layer = vgg_net[style_layer]
feats, height, width, channels = [x.value for x in layer.get_ shape()]
size = height * width * channels
features = tf.reshape(layer, (-1, channels))
style_gram_matrix = tf.matmul(tf.transpose(features), features) / size
style_expected = style_features[style_layer]
style_losses.append(2 * tf.nn.l2_loss(style_gram_matrix - style_expected) / style_expected.size)
style_loss += style_image_weight * tf.reduce_sum(style_losses)
17.The third loss term is called the total variation loss. This comes from calculating the total variation. It is similar to total variation denoising, in that true images have very low local variation, and images with high noise have high local variation. The key term in the following code is the second_term_numerator, which subtracts off nearby pixels. Images with high noise will have high differences and we can treat this as a loss function to minimize:
total_var_x = sess.run(tf.reduce_prod(image[:,1:,:,:].get_ shape()))
total_var_y = sess.run(tf.reduce_prod(image[:,:,1:,:].get_ shape()))
first_term = regularization_weight * 2
second_term_numerator = tf.nn.l2_loss(image[:,1:,:,:] - image[:,:shape[1]-1,:,:])
second_term = second_term_numerator / total_var_y
third_term = (tf.nn.l2_loss(image[:,:,1:,:] - image[:,:,:shape[2]-1,:]) / total_var_x)
total_variation_loss = first_term * (second_term + third_term)
18.The total loss we want to minimize is the combined original, style, and total variation loss:
我们想要最小化的总损失是合并的原始,风格和总变异损失:
loss = original_loss + style_loss + total_variation_loss
19.We next declare our optimizer and training step and initialize all the variables in the model.
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_step = optimizer.minimize(loss)
sess.run(tf.initialize_all_variables())
20.We now loop through our training generations and print a status update every so often and save the temporary image. We'll save the temporary image because it is hard to determine how far to run this algorithm as it can vary, depending on the images chosen. It is best to err on the larger generation size, and stop when a temporary image appears to be a good stopping point:
for i in range(generations):
sess.run(train_step)
# Print update and save temporary output
if (i+1) % output_generations == 0:
print('Generation {} out of {}'.format(i + 1, generations))
image_eval = sess.run(image)
best_image_add_mean = image_eval.reshape(shape[1:]) + normalization_mean
output_file = 'temp_output_{}.jpg'.format(i)
scipy.misc.imsave(output_file, best_image_add_mean)
21.At the end of the algorithm, we'll save the final output:
image_eval = sess.run(image)
best_image_add_mean = image_eval.reshape(shape[1:]) + normalization_mean
output_file = 'final_output.jpg'
scipy.misc.imsave(output_file, best_image_add_mean)
Figure 6: Using the stylenet algorithm to combine the book cover image with Starry Night. Note that different style emphases can be used by changing the weighting at the beginning of the script.
How it works…
We first loaded the two images, then loaded the pre-trained network weights and assigned layers to the original and style images. We calculated three loss functions: an original image loss, a style loss, and a total variation loss. Then we trained random noise pictures to have the style of the style image and the content of the original image.
See also
A Neural Algorithm of Artistic Style by Gatys, Ecker, Bethge. 2015. https://arxiv. org/abs/1508.06576.
|
|