Retraining Existing CNNs models
Training a new image recognition from scratch requires a lot of time and computational power. If we can take a prior trained network and retrain it with our images, it could save us computational time. For this recipe, we will show how to use a pre-trained TensorFlow image recognition model and fine-tune it to work on a different set of images.

Getting ready
Training a new image recognition from scratch requires a lot of time and computational power. If we can take a prior trained network and retrain it with our images, it could save us computational time. The idea is to reuse the weights and structure of a prior model from the convolutional layers and retrain the fully connected layers at the top of the network.

TensorFlow has created a tutorial about training on top of existing CNN models (refer to the first bullet point of the See also section). In this recipe, we will illustrate how to use the same methodology for CIFAR-10. The CNN network we are going to employ uses a very popular architecture called Inception. The Inception CNN model was created by Google and has performed very well in many image recognition benchmarks. For details, see the paper referenced in the second bullet point of See also section.

The main Python script we will cover shows how to download the CIFAR-10 image data and automatically separate, label, and save the images into the ten classes in each of the train and test folders. After that, we will reiterate how to train the network on our images.

How to do it…
1.We'll start by loading the necessary libraries for downloading, unzipping, and saving the CIFAR-10 images:
import os
import tarfile
import _pickle as cPickle
import numpy as np
import urllib.request
import scipy.misc

2.We now declare the CIFAR-10 data link and make the temporary directory we will store the data in. We'll also declare the ten categories to reference for saving the images later on:
cifar_link = 'https://www.cs.toronto.edu/~kriz/cifar-10-python. tar.gz'
data_dir = 'temp'
if not os.path.isdir(data_dir):
objects = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

3.Now we'll download the CIFAR-10 .tar data file, and un-tar the file:
target_file = os.path.join(data_dir, 'cifar-10-python.tar.gz')
if not os.path.isfile(target_file):
print('CIFAR-10 file not found. Downloading CIFAR data (Size = 163MB)')
print('This may take a few minutes, please wait.')
filename, headers = urllib.request.urlretrieve(cifar_link, target_file)
# Extract into memory
tar = tarfile.open(target_file)

4.We now create the necessary folder structure for training. The temporary directory will have two folders, train_dir and validation_dir. In each of these folders, we will create the ten sub-folders for each category:
# Create train image folders
train_folder = 'train_dir'
if not os.path.isdir(os.path.join(data_dir, train_folder)):
for i in range(10):
folder = os.path.join(data_dir, train_folder, objects)

# Create test image folders
test_folder = 'validation_dir'
if not os.path.isdir(os.path.join(data_dir, test_folder)):
for i in range(10):
folder = os.path.join(data_dir, test_folder, objects)

5.In order to save the images, we will create a function that will load them from memory and store them in an image dictionary:
def load_batch_from_file(file):
file_conn = open(file, 'rb')
image_dictionary = cPickle.load(file_conn, encoding='latin1')

6.With the above dictionary, we will save each of the files in the correct location with the following function:
def save_images_from_dict(image_dict, folder='data_dir'):
for ix, label in enumerate(image_dict['labels']):
folder_path = os.path.join(data_dir, folder, objects[label])
filename = image_dict['filenames'][ix]
#Transform image data
image_array = image_dict['data'][ix]
image_array.resize([3, 32, 32])
# Save image
output_location = os.path.join(folder_path, filename)

7.With the preceding functions, we can loop through the downloaded data files and save each image to the correct location:
data_location = os.path.join(data_dir, 'cifar-10-batches-py')
train_names = ['data_batch_' + str(x) for x in range(1,6)]
test_names = ['test_batch']
# Sort train images
for file in train_names:
print('Saving images from file: {}'.format(file))
file_location = os.path.join(data_dir, 'cifar-10-batches-py', file)
image_dict = load_batch_from_file(file_location)
save_images_from_dict(image_dict, folder=train_folder)
# Sort test images

for file in test_names:
print('Saving images from file: {}'.format(file))
file_location = os.path.join(data_dir, 'cifar-10-batches-py', file)
image_dict = load_batch_from_file(file_location)
save_images_from_dict(image_dict, folder=test_folder)

8.The last part of our script creates the label file, and this is the last piece of information that we will need. This file will let us interpret the outputs as labels instead of the numerical indices:
cifar_labels_file = os.path.join(data_dir,'cifar10_labels.txt')
print('Writing labels file, {}'.format(cifar_labels_file))
with open(cifar_labels_file, 'w') as labels_file:
for item in objects:

9.When the above script is run, it will download the images and sort them into the correct folder structure that the TensorFlow retraining tutorial expects. Once we have done this, we just follow the tutorial accordingly. First we should clone the tutorial repository:
git clone https://github.com/tensorflow/models/tree/master/ inception/inception

10.In order to use a prior trained model, we must download the network weights and apply it to our model:
me@computer:~$ curl -O http://download.tensorflow.org/models/ image/imagenet/inception-v3-2016-03-01.tar.gz
me@computer:~$ tar xzf inception-v3-2016-03-01.tar.gz

11.Now that we have the images in the correct folder structure, we have to turn them into a TFRecords object. We do this by running the following commands:
me@computer:~$ python3 data/build_image_data.py
--output_directory="temp/" --labels_file="temp/cifar10_labels.txt"

12.Now we'll train the model using bazel, setting the parameter ' to true. This script outputs the loss every 10 generations. We can kill this process at any time and the model output will be in the folder temp/training_results. We can load the model from this folder for evaluation:
me@computer:~$ bazel-bin/inception/flowers_train
--train_dir="temp/training_results" --data_dir="temp/data_dir"
--fine_tune=True --initial_learning_rate=0.001
13.This should result in output similar to the following:
2016-09-18 12:16:32.563577: step 1290, loss = 2.02 (1.2 examples/ sec; 26.965 sec/batch)
2016-09-18 12:25:41.316540: step 1300, loss = 2.01 (1.2 examples/ sec; 26.357 sec/batch)

How it works…
The official TensorFlow tutorial for training on top of a pre-trained CNN requires a folder setup that we created from the CIFAR-10 data. We then converted the data into the required TFRecords format and started training the model. Remember that we are fine-tuning the model and retraining the fully connected layers at the top to fit our 10-category data.

See also
Official Tensorflow Inception-v3 tutorial: https://github.com/tensorflow/ models/tree/master/inception
Googlenet Inception-v3 paper: https://arxiv.org/abs/1512.00567

Applying Stylenet/Neural-Style
Once we have an image recognition CNN trained, we can use the network itself for some interesting data and image processing. Stylenet is a procedure that attempts to learn an image style from one picture and apply it to a second picture while keeping the second image structure (or content). This may be possible if we can find intermediate CNN nodes that correlate strongly with a style separately from the content of the image.

Getting ready
Stylenet is a procedure that takes two images and applies the style of one image to the content of the second image. It is based on a famous paper in 2015, A Neural Algorithm of Artistic Style (refer to the first bullet point under See also section). The authors found a property of some CNNs where intermediate layers exist that seem to encode the style of a picture and some encode the content of the picture. To this end, if we train the style layers on the style picture and the content layers on the original image, and back-propagate those calculated losses, we can change the original image to be more like the style image.

In order to accomplish this, we will download the recommended network from the paper, called the imagenet-vgg-19. There is also an imagenet-vgg-16 network that works as well, but the paper recommends imagenet-vgg-19.

How to do it…
1.First, we'll download the pretrained network in *.mat format. The mat format is a matlab object, and the scipy package in Python has a method that can read it. The link to download the mat object is here. We save this model in the same folder our Python script is for reference:
http://www.vlfeat.org/matconvnet ... vgg-verydeep-19.mat

2.We'll start our Python script by loading the necessary libraries:
import os
import scipy.misc
import numpy as np
import tensorflow as tf

3.Then we can start a graph session and declare the locations of our two images: the original image and the style image. For our purposes, we will use the cover image of this book for the original image; for the style image, we will use Starry Night by Vincent van Gough. Feel free to use any two pictures you want here. If you choose to use these pictures, they are available on the book's github site, https://github. com/nfmcclure/tensorflow_cookbook (Navigate tostyelnet section):
sess = tf.Session()
original_image_file = 'temp/book_cover.jpg'
style_image_file = 'temp/starry_night.jpg'

4.We'll set some parameters for our model: the location of the mat file, weights, the learning rate, number of generations, and how frequently we should output the intermediate image. For the weights, it helps to highly weight the style image over the original image. These hyperparameters should be tuned for changes in the desired result:
vgg_path ='imagenet-vgg-verydeep-19.mat'
original_image_weight = 5.0
style_image_weight = 200.0
regularization_weight = 50.0
learning_rate = 0.1
generations = 10000
output_generations = 500

5.Now we'll load the two images with scipy and change the style image to fit the original image dimensions:
original_image = scipy.misc.imread(original_image_file)
style_image = scipy.misc.imread(style_image_file)
# Get shape of target and make the style image the same
target_shape = original_image.shape
style_image = scipy.misc.imresize(style_image, target_shape[1] / style_image.shape[1])

6.From the paper, we can define the layers in order of how they appeared. We'll use the author's naming convention:
vgg_layers = ['conv1_1', 'relu1_1',
'conv1_2', 'relu1_2', 'pool1',
'conv2_1', 'relu2_1',
'conv2_2', 'relu2_2', 'pool2',
'conv3_1', 'relu3_1',
'conv3_2', 'relu3_2',
'conv3_3', 'relu3_3',
'conv3_4', 'relu3_4', 'pool3',
'conv4_1', 'relu4_1',
'conv4_2', 'relu4_2',
'conv4_3', 'relu4_3',
'conv4_4', 'relu4_4', 'pool4',
'conv5_1', 'relu5_1',
'conv5_2', 'relu5_2',
'conv5_3', 'relu5_3',
'conv5_4', 'relu5_4']

7.Now we'll define a function that will extract the parameters from the mat file:
def extract_net_info(path_to_params):
vgg_data = scipy.io.loadmat(path_to_params)
normalization_matrix = vgg_data['normalization'][0][0][0]
mat_mean = np.mean(normalization_matrix, axis=(0,1))
network_weights = vgg_data['layers'][0]
return(mat_mean, network_weights)

8.From the loaded weights and the layer definitions, we can recreate the network in TensorFlow with the following function. We'll loop through each layer and assign the corresponding function with appropriate weights and biases, where applicable:
def vgg_network(network_weights, init_image):
network = {}
image = init_image
for i, layer in enumerate(vgg_layers):
if layer[1] == 'c':

weights, bias = network_weights[0][0][0][0]
weights = np.transpose(weights, (1, 0, 2, 3))
bias = bias.reshape(-1)
conv_layer = tf.nn.conv2d(image, tf.constant(weights), (1, 1, 1, 1), 'SAME')
image = tf.nn.bias_add(conv_layer, bias)
elif layer[1] == 'r':
image = tf.nn.relu(image)
image = tf.nn.max_pool(image, (1, 2, 2, 1), (1, 2, 2, 1), 'SAME')
network[layer] = image

9.The paper recommends a few strategies of assigning intermediate layers to the original and style images. While we should keep relu4_2 for the original image, we can try different combinations of the other reluX_1 layer outputs for the style image:
original_layer = 'relu4_2'
style_layers = ['relu1_1', 'relu2_1', 'relu3_1', 'relu4_1', 'relu5_1']

10.Next, we'll run the above function to get the weights and mean. We'll also change the image shapes to have four dimensions by adding a dimension of size one to the beginning. TensorFlow's image operations act on four dimensions, so we must add the batch-size dimension:
normalization_mean, network_weights = extract_net_info(vgg_path)
shape = (1,) + original_image.shape
style_shape = (1,) + style_image.shape
original_features = {}
style_features = {}

11.Next, we declare the image placeholder and create the network with that placeholder:
image = tf.placeholder('float', shape=shape)
vgg_net = vgg_network(network_weights, image)

12.We now normalize the original image matrix and run it through the network:
original_minus_mean = original_image - normalization_mean
original_norm = np.array([original_minus_mean])
original_features[original_layer] = sess.run(vgg_net[original_ layer],
feed_dict={image: original_norm})

13.We repeat the same procedure with each of the style layers that we chose in Step 9:
image = tf.placeholder('float', shape=style_shape)
vgg_net = vgg_network(network_weights, image)
style_minus_mean = style_image - normalization_mean
style_norm = np.array([style_minus_mean])
for layer in style_layers:
layer_output = sess.run(vgg_net[layer], feed_dict={image: style_norm})
layer_output = np.reshape(layer_output, (-1, layer_output. shape[3]))
style_gram_matrix = np.matmul(layer_output.T, layer_output) / layer_output.size
style_features[layer] = style_gram_matrix

14.In order to create the combined image, we'll start with random noise and run it through the network:
initial = tf.random_normal(shape) * 0.05
image = tf.Variable(initial)
vgg_net = vgg_network(network_weights, image)

15.We now declare the first loss, the loss on the original image. We use the size-normalized l2-loss between the output of the normalized original image from step 12 and the output of the layer designated to represent the original content from step 9:
original_loss = original_image_weight * (2 * tf.nn.l2_loss(vgg_ net[original_layer] - original_features[original_layer]) / original_features[original_layer].size)

16.Now we calculate the same type of loss for each style layer:
style_loss = 0
style_losses = []
for style_layer in style_layers:
layer = vgg_net[style_layer]
feats, height, width, channels = [x.value for x in layer.get_ shape()]
size = height * width * channels
features = tf.reshape(layer, (-1, channels))
style_gram_matrix = tf.matmul(tf.transpose(features), features) / size
style_expected = style_features[style_layer]
style_losses.append(2 * tf.nn.l2_loss(style_gram_matrix - style_expected) / style_expected.size)
style_loss += style_image_weight * tf.reduce_sum(style_losses)

17.The third loss term is called the total variation loss. This comes from calculating the total variation. It is similar to total variation denoising, in that true images have very low local variation, and images with high noise have high local variation. The key term in the following code is the second_term_numerator, which subtracts off nearby pixels. Images with high noise will have high differences and we can treat this as a loss function to minimize:

total_var_x = sess.run(tf.reduce_prod(image[:,1:,:,:].get_ shape()))
total_var_y = sess.run(tf.reduce_prod(image[:,:,1:,:].get_ shape()))
first_term = regularization_weight * 2
second_term_numerator = tf.nn.l2_loss(image[:,1:,:,:] - image[:,:shape[1]-1,:,:])
second_term = second_term_numerator / total_var_y
third_term = (tf.nn.l2_loss(image[:,:,1:,:] - image[:,:,:shape[2]-1,:]) / total_var_x)
total_variation_loss = first_term * (second_term + third_term)

18.The total loss we want to minimize is the combined original, style, and total variation loss:
loss = original_loss + style_loss + total_variation_loss

19.We next declare our optimizer and training step and initialize all the variables in the model.
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_step = optimizer.minimize(loss)

20.We now loop through our training generations and print a status update every so often and save the temporary image. We'll save the temporary image because it is hard to determine how far to run this algorithm as it can vary, depending on the images chosen. It is best to err on the larger generation size, and stop when a temporary image appears to be a good stopping point:

for i in range(generations):
# Print update and save temporary output
if (i+1) % output_generations == 0:
print('Generation {} out of {}'.format(i + 1, generations))
image_eval = sess.run(image)
best_image_add_mean = image_eval.reshape(shape[1:]) + normalization_mean
output_file = 'temp_output_{}.jpg'.format(i)
scipy.misc.imsave(output_file, best_image_add_mean)

21.At the end of the algorithm, we'll save the final output:
image_eval = sess.run(image)
best_image_add_mean = image_eval.reshape(shape[1:]) + normalization_mean
output_file = 'final_output.jpg'
scipy.misc.imsave(output_file, best_image_add_mean)

Figure 6: Using the stylenet algorithm to combine the book cover image with Starry Night. Note that different style emphases can be used by changing the weighting at the beginning of the script.

How it works…
We first loaded the two images, then loaded the pre-trained network weights and assigned layers to the original and style images. We calculated three loss functions: an original image loss, a style loss, and a total variation loss. Then we trained random noise pictures to have the style of the style image and the content of the original image.

See also
A Neural Algorithm of Artistic Style by Gatys, Ecker, Bethge. 2015. https://arxiv. org/abs/1508.06576.


