This is a tutorial for developing a basic autoencoder using python, and keras. The tutorial mentioned on this page follows from: https://blog.keras.io/building-autoencoders-in-keras.html
- 1 Overview
- 2 Deep autoencoder for grayscale images
- 3 Deep autoencoder for RGB images
- 4 Convolutional autoencoder for grayscale & RGB images
- 5 Sample Images
(From the link mentioned above:)"Autoencoding" is a data compression algorithm where the compression and decompression functions are
- Lossy, and
- Learned automatically from examples rather than engineered by a human
Additionally, in almost all contexts where the term "autoencoder" is used, the compression and decompression functions are implemented with neural networks. This tutorial will focus on building two types of Autoencoders,
- A deep autoencoder
- A convolutional autoencoder
Deep autoencoder for grayscale images
In order to build a deep autoencoder for grayscale images, here are the steps we should be following:
- Import required packages
- Create the structure of the autoencoder
- Import data and fit it to our newly built autoencoder
- Plot our test data
Importing the required packages
This tutorial only requires one package and that is keras. From keras, we will be importing the Input, Dense and the Model functions.
from keras.layers import Input, Dense from keras.models import Model
The Input function takes a shape tuple as it's argument and creates a tensor based on that. The Dense function is used to specify a layer of neurons, and the activation function they posses. The Model function combines different layers and allows us to address a couple of layers together, as a model.
Creating the structure of the Autoencoder
We will be using the aforementioned Dense and Model functions to create the structure of the Neural net. Let us create an autoencoder for the MNIST dataset (http://yann.lecun.com/exdb/mnist/). This is a dataset of thounsands of 28x28 grayscale pictures of handwritten digits. Since we have 28x28 pixels, our autoencoder must have 784 neurons in it's input layer.
input_img = Input(shape=(784,))
Now we have an input layer with 784 neurons. Now we proceed to construct the hidden layers.
Let us choose the architecture where from the input layer, the autoencoder gradually decreases in neurons through 128, 64 and at the "waist" of the autoencoder, we have 32 neurons. Since we also need to decode the image from 32 pixels back to 784, let it progress through the same number of neurons. Thus, the architecture is 784 ---> 128 ---> 64 ---> 32(Waist) ---> 64 ---> 128 ---> 784. The dense function takes as input the number of neurons in the new layer, the activation function for all the neurons and accounts for what the previous layer was.
encoded = Dense(128, activation='relu')(input_img) encoded = Dense(64, activation='relu')(encoded) encoded = Dense(32, activation='relu')(encoded) decoded = Dense(64, activation='relu')(encoded) decoded = Dense(128, activation='relu')(decoded) decoded = Dense(784, activation='sigmoid')(decoded) autoencoder = Model(input_img, decoded) autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
Finally, the Model function creates a container for us to collectively address the neural network that begins at the input_img layer and ends at the decoded layer. The compile function sets the values for the attributes optimizer and designates the binary_crossentropy loss function to be used.
Importing data and fitting it to our autoencoder
Within the datasets module of keras, we have the option to import the MNIST dataset. For this, all we have to do is:
from keras.datasets import mnist
Next we have to split the dataset into a training and a test set. We will make use of the load_data() function present within the mnist object.
(x_train, _), (x_test, _) = mnist.load_data()
In order to provide the data to the neural network in the right fashion, it is important for us to perform three main steps:
- One, we must convert the datatype of data to float.
- It is better to normalize our inputs to the neural network. Hence we must map the domain [0,255] to the range [0,1]
- The data which is currently present in the shape (x, 28, 28) (which is basically x instances of 28x28 images) must be reshaped into (x, 784).
Fortunately, for the three aforementioned steps, we have predefined functions that make our lives easier.
x_train = x_train.astype('float32') / 255. x_test = x_test.astype('float32') / 255.
We have now converted the datatype to float and divided all the values by the maximum, which is 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:]))) x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
The np.product function converts the (x, 28, 28) shape to (x, 784) and the reshape function sets it to this shape.
In order to fit the data, we call the fit function in the Model class. Optional: If needed, we can also log the training error vs iterations as a graph using the Tensorboard object from keras.callbacks.
from keras.callbacks import TensorBoard autoencoder.fit(x_train, x_train, epochs=100, batch_size=256, shuffle=True, validation_data=(x_test, x_test), callbacks=[TensorBoard(log_dir=<path to where the logs must be stored>)])
The target values are the same as our input, because we wish for our autoencoder to reproduce images as accurately as possible. Here we train our autoencoder for 100 epochs and shuffle the data. The logs are stored in the specified path. Tensorboard takes this log data and plots it.
Plotting test data
In order for us to obtain the predictions from the autoencoder, we must use the predict function in the Model class.
decoded_imgs = autoencoder.predict(x_test)
Now the variable decoded images will be of the shape length of test data, 784 Now we seek to plot 10 images out of the test data. For doing this, we require the matplotlib package. It is very versatile and can plot various graphs in a very elegant manner.
import matplotlib.pyplot as plt
Since we seek to plot ten figures and each figure will have a corresponding prediction, there will be 20 images. Hence we need to set our overall figure size.
n = 10 plt.figure(figsize=(20, 4))
Now we shall plot all the images. We run a loop to print each image and it's corresponding prediction by the autoencoder. Some steps which have to be followed while plotting:
- The relative position of the input and the prediction must be aligned.
- The (784,) shape has to be resized into (28,28).
- The colorscheme has to be set to grayscale.
- Since they are images, the x-axis and y-axis should be invisible.
The corresponding python code to do the above steps is as follows:
for i in range(n): # display original ax = plt.subplot(2, n, i + 1) #Plot at (n,i+1) plt.imshow(x_test[i].reshape(28, 28))#reshaping into image plt.gray()#setting colorscheme ax.get_xaxis().set_visible(False)#making axes invisible ax.get_yaxis().set_visible(False) # display reconstruction ax = plt.subplot(2, n, i + 1 + n) #Plot at (n,i+1+n) plt.imshow(decoded_imgs[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) plt.savefig('deep_autoenc.png')
The plot.savefig function saves the output accordingly.
Deep autoencoder for RGB images
With a few changes, we can have our marvelous AE work even for RGB images. For this example we will be considering the CIFAR-10 image dataset which can be found here: https://www.cs.toronto.edu/~kriz/cifar.html
- Change 1: The input layer first has to be modified, because initially when we considered only grayscale images, every point in our image matrix had only one value. However, in this case, every value in the image grid will have 3 values. Since the resolution of each image in the CIFAR-10 dataset is 32x32 and there are three colors, our input layer will now have to take 32x32x3 = 3072 pixels. Accordingly, the output layer will also vary
input_img = Input(shape=(3072,)) ... .. decoded = Dense(3072, activation='sigmoid')(decoded)
- Change 2: When we import the data from the keras.datasets module, we import the CIFAR10 dataset instead of the MNIST dataset. Accordingly, we must replace all occurrences of the mnist object with the cifar10 object.
from keras.datasets import cifar10
- Change 3: While plotting the image, every occurrence of the shape (28, 28) must be replaced with the shape (32, 32, 3). Furthermore, we must remove every occurrence of the plt.gray().
Convolutional autoencoder for grayscale & RGB images
Having completely witnessed how these neural nets can be used as autoencoders, we will now experiment with a very popular model, mainly used for image recognition (and other related applications) which are Convolutional Neural Networks
A brief overview
While we now have a grasp on what has to be done, it might be hazy to understand how exactly will we employ a CNN to get our job done. This section will elaborate on that.
- Until now, we had the neural network reducing a 1D converted representation of the image into tighter and smaller representations.
- Every layer of the neural net could be represented using a 1D array which progressively became shorter until the waist. From thereon, it grew to it's native dimensions.
- CNNs have this wonderful feature of preserving spatial information. In other words, they are made in such a way that the effect of surrounding pixels on a single pixel is preserved. Since spatiality is crucial, images are best suited for input as 3D arrays.
- Once the input has been fed, a window of a given dimension (usually odd) moves through the image performing the convolution operation. There are two subtle points to note here:
- First is that, this would pose a problem when the window is centered on the first pixel. That is, if a window of size (3,3) is centered on the first pixel (0,0), we must have some values for the window to convolve on at (-1,-1), etc. Thus, we need to pad the image before using it.
- By doing so, the image's dimensions are preserved. This is because the window moves through every pixel exactly once and produces one result. Thus the dimensions are preserved. In order to abstract features, etc. there is usually a layer introducing non-linearity after the convolution layer, which takes as input multiple pixel, applies a non linear function on them and spits out fewer pixels.
- The aforementioned process is thus repeated until the waist of the network.
- Once the waist has been reached, we begin to do the opposite of the aforementioned steps. We convolve and we upsample existing values to increase the dimensions.
- Various functions in Keras are used for the steps listed above
- Convolutions are executed using the Conv2D function
- Maximum finding is performed though the MaxPooling2D function
- Up sampling is done using the UpSampling2D function
Coding it Up!
The only changes we have to make to our existing boilerplate code from Deep autoencoders for grayscale images are pertaining to building the model. Let's see how to go about it:
- Unlike a 1D input, in this case, we will have a 3D input representing the original image
input_img = Input(shape=(28, 28, 1))
- In order to construct a convolution layer, we will be calling the Conv2D function which takes in the number of layers, the size of the window, the activation function type to be applied by each cell and how the padding must be performed.
Immediately followed by this, we will have our MaxPooling2D function to implement the maximum computation. The default stride length (or number of pixels it moves by) is 2.
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img) x = MaxPooling2D((2, 2), padding='same')(x)
- Thus, we proceed to reduce the dimensions accordingly. Once can even use internediate print statements to print 'x' to see how the Tensor changes.
#At this point, print x will yield: #Tensor("max_pooling2d_1/MaxPool:0", shape=(?, 14, 14, 16), dtype=float32) x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) #print x here will yield: Tensor("conv2d_2/Relu:0", shape=(?, 14, 14, 8), dtype=float32) x = MaxPooling2D((2, 2), padding='same')(x) x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) encoded = MaxPooling2D((2, 2), padding='same')(x) # at this point the representation is (4, 4, 8) i.e. 128-pixels
- Now that we've reduced a 784 pixel image into a 128 pixel image, we must build it up. For this purpose, we use the UpSampling2D function. Providing an argument of (2,2) doubles the value of the dimensions.
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded) x = UpSampling2D((2, 2))(x) x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) x = UpSampling2D((2, 2))(x) x = Conv2D(16, (3, 3), activation='relu')(x) x = UpSampling2D((2, 2))(x) decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
- It might be even surprising to note that only a minor modification is required for RGB images and that is to change the the last layer's(decoded layer) first argument to 3. Everything from here on, follows from the corresponding deep autoencoder code.
- Just as a final tip: Don't forget to import the Conv2D, MaxPooling2D, and UpSampling2D functions and the input_img shape changes for RGB images!