Difference between revisions of "Tutotrial 1"
Ramanikeshav (talk | contribs) |
Ramanikeshav (talk | contribs) |
||
Line 168: | Line 168: | ||
The following is a sample from the MNIST dataset. The top row is the actual image and the bottom row is the predicted image: | The following is a sample from the MNIST dataset. The top row is the actual image and the bottom row is the predicted image: | ||
[[File:deep_autoencoder.jpg|frameless]] | [[File:deep_autoencoder.jpg|frameless]] | ||
− | [[File:col_deep_autoenc. | + | [[File:col_deep_autoenc.png]] |
Revision as of 05:53, 7 March 2018
This is a tutorial for developing a basic autoencoder using python, and keras. The tutorial mentioned on this page follows from: https://blog.keras.io/building-autoencoders-in-keras.html
Contents
Overview
(From the link mentioned above:)"Autoencoding" is a data compression algorithm where the compression and decompression functions are
- Data-specific
- Lossy, and
- Learned automatically from examples rather than engineered by a human
Additionally, in almost all contexts where the term "autoencoder" is used, the compression and decompression functions are implemented with neural networks. This tutorial will focus on building two types of Autoencoders,
- A deep autoencoder
- A convolutional autoencoder
Deep autoencoder for grayscale images
In order to build a deep autoencoder for grayscale images, here are the steps we should be following:
- Import required packages
- Create the structure of the autoencoder
- Import data and fit it to our newly built autoencoder
- Plot our test data
Importing the required packages
This tutorial only requires one package and that is keras. From keras, we will be importing the Input, Dense and the Model functions.
from keras.layers import Input, Dense from keras.models import Model
The Input function takes a shape tuple as it's argument and creates a tensor based on that. The Dense function is used to specify a layer of neurons, and the activation function they posses. The Model function combines different layers and allows us to address a couple of layers together, as a model.
Creating the structure of the Autoencoder
We will be using the aforementioned Dense and Model functions to create the structure of the Neural net. Let us create an autoencoder for the MNIST dataset (http://yann.lecun.com/exdb/mnist/). This is a dataset of thounsands of 28x28 grayscale pictures of handwritten digits. Since we have 28x28 pixels, our autoencoder must have 784 neurons in it's input layer.
input_img = Input(shape=(784,))
Now we have an input layer with 784 neurons. Now we proceed to construct the hidden layers.
Let us choose the architecture where from the input layer, the autoencoder gradually decreases in neurons through 128, 64 and at the "waist" of the autoencoder, we have 32 neurons. Since we also need to decode the image from 32 pixels back to 784, let it progress through the same number of neurons. Thus, the architecture is 784 ---> 128 ---> 64 ---> 32(Waist) ---> 64 ---> 128 ---> 784. The dense function takes as input the number of neurons in the new layer, the activation function for all the neurons and accounts for what the previous layer was.
encoded = Dense(128, activation='relu')(input_img) encoded = Dense(64, activation='relu')(encoded) encoded = Dense(32, activation='relu')(encoded) decoded = Dense(64, activation='relu')(encoded) decoded = Dense(128, activation='relu')(decoded) decoded = Dense(784, activation='sigmoid')(decoded) autoencoder = Model(input_img, decoded) autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
Finally, the Model function creates a container for us to collectively address the neural network that begins at the input_img layer and ends at the decoded layer. The compile function sets the values for the attributes optimizer and designates the binary_crossentropy loss function to be used.
Importing data and fitting it to our autoencoder
Within the datasets module of keras, we have the option to import the MNIST dataset. For this, all we have to do is:
from keras.datasets import mnist
Next we have to split the dataset into a training and a test set. We will make use of the load_data() function present within the mnist object.
(x_train, _), (x_test, _) = mnist.load_data()
In order to provide the data to the neural network in the right fashion, it is important for us to perform three main steps:
- One, we must convert the datatype of data to float.
- It is better to normalize our inputs to the neural network. Hence we must map the domain [0,255] to the range [0,1]
- The data which is currently present in the shape (x, 28, 28) (which is basically x instances of 28x28 images) must be reshaped into (x, 784).
Fortunately, for the three aforementioned steps, we have predefined functions that make our lives easier.
x_train = x_train.astype('float32') / 255. x_test = x_test.astype('float32') / 255.
We have now converted the datatype to float and divided all the values by the maximum, which is 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:]))) x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
The np.product function converts the (x, 28, 28) shape to (x, 784) and the reshape function sets it to this shape.
In order to fit the data, we call the fit function in the Model class. Optional: If needed, we can also log the training error vs iterations as a graph using the Tensorboard object from keras.callbacks.
from keras.callbacks import TensorBoard autoencoder.fit(x_train, x_train, epochs=100, batch_size=256, shuffle=True, validation_data=(x_test, x_test), callbacks=[TensorBoard(log_dir=<path to where the logs must be stored>)])
The target values are the same as our input, because we wish for our autoencoder to reproduce images as accurately as possible. Here we train our autoencoder for 100 epochs and shuffle the data. The logs are stored in the specified path. Tensorboard takes this log data and plots it.
Plotting test data
In order for us to obtain the predictions from the autoencoder, we must use the predict function in the Model class.
decoded_imgs = autoencoder.predict(x_test)
Now the variable decoded images will be of the shape length of test data, 784 Now we seek to plot 10 images out of the test data. For doing this, we require the matplotlib package. It is very versatile and can plot various graphs in a very elegant manner.
import matplotlib.pyplot as plt
Since we seek to plot ten figures and each figure will have a corresponding prediction, there will be 20 images. Hence we need to set our overall figure size.
n = 10 plt.figure(figsize=(20, 4))
Now we shall plot all the images. We run a loop to print each image and it's corresponding prediction by the autoencoder. Some steps which have to be followed while plotting:
- The relative position of the input and the prediction must be aligned.
- The (784,) shape has to be resized into (28,28).
- The colorscheme has to be set to grayscale.
- Since they are images, the x-axis and y-axis should be invisible.
The corresponding python code to do the above steps is as follows:
for i in range(n): # display original ax = plt.subplot(2, n, i + 1) #Plot at (n,i+1) plt.imshow(x_test[i].reshape(28, 28))#reshaping into image plt.gray()#setting colorscheme ax.get_xaxis().set_visible(False)#making axes invisible ax.get_yaxis().set_visible(False) # display reconstruction ax = plt.subplot(2, n, i + 1 + n) #Plot at (n,i+1+n) plt.imshow(decoded_imgs[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) plt.savefig('deep_autoenc.png')
The plot.savefig function saves the output accordingly.
Deep autoencoder for RGB images
With a few changes, we can have our marvelous AE work even for RGB images. For this example we will be considering the CIFAR-10 image dataset which can be found here: https://www.cs.toronto.edu/~kriz/cifar.html
- Change 1: The input layer first has to be modified, because initially when we considered only grayscale images, every point in our image matrix had only one value. However, in this case, every value in the image grid will have 3 values. Since the resolution of each image in the CIFAR-10 dataset is 32x32 and there are three colors, our input layer will now have to take 32x32x3 = 3072 pixels. Accordingly, the output layer will also vary
input_img = Input(shape=(3072,)) ... .. decoded = Dense(3072, activation='sigmoid')(decoded)
- Change 2: When we import the data from the keras.datasets module, we import the CIFAR10 dataset instead of the MNIST dataset. Accordingly, we must replace all occurrences of the mnist object with the cifar10 object.
from keras.datasets import cifar10
- Change 3: While plotting the image, every occurrence of the shape (28, 28) must be replaced with the shape (32, 32, 3). Furthermore, we must remove every occurrence of the plt.gray().
And voila! Our autoencoder has learnt to encode and decode images! Until the next tutorial, it's goodbye from the Knights who say 'ni'!
Sample Image
The following is a sample from the MNIST dataset. The top row is the actual image and the bottom row is the predicted image: