Another class of neural networks, that exploit low-dimensional structure in high-dimensional data, is called autoencoders. Autoencoders can be seen as a generalization of linear subspace embedding of singular value decomposition/principal component analysis1 to a nonlinear manifold embedding.
The main idea, as stated in (Goodfellow, Bengio, and Courville 2016, chap. 14), is to train a NN to copy its input \(x\) via the latent space\(y\) to the output \(z\). Crucially, \(y\) is usually a low dimensional space, represents the output of the encoding and the input of the decoding part of the NN. This will force the latent space to reveal some useful properties that we can studied and exploited, same as PCA.
Figure 9.1: General structure of an autoencoder with the encoder network on the left and the decoder network on the right. Note: they are not necessarily identical or just the reverse.
During the training the weights of the (two) neural networks are learned by minimizing a loss function that measures the difference between the input \(x\) and the reconstruction \(z\), e.g. \(\operatorname{argmin}\|z - x\|^2\).
To connect this concept back to PCA and what we discussed in the Eigenfaces Example2, where we used truncated SVD to project into a low dimensional space and than back to reconstruct the image, we formulate it in a more mathematical fashion.
We call the encoder \(y=f_\Theta(x)\) and the decoder \(z=g_\Theta(y)\). Therefore, \(z=(g \circ f)(x) = g(f(x))\) and \[
\begin{aligned}
f : \mathcal{X} \to \mathcal{Y}\\
g : \mathcal{Y} \to \mathcal{Z}\\
\end{aligned}
\] and the loss is computed as \[
\mathscr{L}(x, g(f(x))),
\] with an appropriate loss function \(\mathscr{L}\).
This loss function is most of the time including some more general constraints to enforce something like sparsity, similar as LASSO3 or RIDGE4, this is often referred to regularized autoencoders.
Note
We can use the concept of an autoencoder to span the same space as a PCA. For a linear encoder/decoder and a latent space smaller than the input with the loss function being the mean least square, the autoencoder will learn the principal components of the data. For nonlinear functions \(f\) and \(g\) a more general form is learned.
9.1 Applications
We can use Autoencoders for various different applications:
Image denoising: In this case the autoencoder is trained with noisy images and learns how to reconstruct clean images from them.
Anomaly detection: An autoencoder that is trained with normal images can detect anomalies by identifying inputs that result in a high reconstruction loss, this indicated a deviation from the learned images.
Image generation: After training an autoencoder it is possible to use the decoder part to generate images for manipulated values in the latent space.
Feature extraction: The latent representation of the trained autoencoder can provide a compact informative feature for image classification and retrieval.
Image compression: By storing only the latent space representation an image can be compressed and reconstructed by inferring through the decoder.
Image enhancement: By providing a low resolution images and their high resolution counterparts images can learn to enhance certain features of images.
To illustrate this we construct an autoencoder for denoising the MNIST dataset we have seen before (Section 1.5).
First, we need to prepare and augment the data with some noise.
Figure 9.2: Some of the test images denoised via the model. Top row shows the original image, second with noise and last row after calling the model.
Exercise 9.1 (Add more transformations) Rework the transform function to include the following additions.
Shuffle the pixels of the image with a fixed permutation.
Rotate the image with a fixed rotation angle, extend the image first, rotate and than crop it again.
Flip the image - mirror with respect to the middle.
Retrain the autoencoder with these training sets and see how it performs (can it handle all of them at the same time?).
Exercise 9.2 (Explore the latent space and generate numbers) Rework the above autoencoder in such a fashion that the encoder an decoder can be accessed individually. If for the later tasks reworking the model such that the latent space is smaller is beneficial, feel free to do so, e.g. only 2 dimensions.
Explore the latent space (e.g. similar to what was done in the introduction to Clustering and Classification for the cats and dogs dataset) and see how well the clusters of numbers can be seen.
Use the knowledge of the previous task to create a number generator by providing different self generated elements from the latent space to the decoder part of our autoencoder.
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.