As we have seen before and maybe know from our own experience, most image and audio signals are highly compressible. Here compressible means we can find a basis that allows for a sparse reprehension of our signal. Let us put this into a small definition.
Definition 12.1 (\(K\)-sparse data) A compressible signal \(x\in \mathbb{R}^n\) may be written in a sparse vector \(s\in \mathbb{R}^n\) with a basis transformation (see Definition 1.8 for the formal definition) expressed by the matrix \(B\in\mathbb{R}^{n\times n}\) and \[
x = B s.
\] The vector \(s\) is called \(K\)-sparse if it contains exactly \(K\) non-zero elements.
Important
It is important to note that this does not imply that \(B\) is sparse.
If the basis is generic such as the Fourier or Wavelet basis, i.e. we do not need to store the matrix \(B\) but can generate it on the spot, we only need to store the \(K\) non-zero elements of \(s\) to reconstruct the original signal.
Note
As we have seen in Chapter 11 images (and audio) signals are highly compressible in the Fourier and Wavelet basis with most entries small or zero. By setting the small values to zero we reduce the density further without a high loss of quality.
We see this in JPEG compression for images and MP3 compression for audio signals. If we stream an audio signal or view an image on the web we only need to transfer \(s\) and not \(x\), saving bandwidth and storage as we go.
We have seen in Figure 4.3 that we can use the SVD to reduce the size as well. The downside here is that we need to store \(U\) and \(V\) (Definition 4.2) even if we reduce the rank. This is rather inefficient. On the other hand, we have used SVD in Section 4.2.2 with the Eigenfaces example how we can create a basis with SVD that can be used to classify an entire class of images - human faces. Storing the basis matrices in this case is comparable cheap and it allows us to use certain aspects of the downsampling for learning purposes.
We also need to stress that SVD and Fourier are unitary transformations which make the move into and from the basis cheep. This is the basis for a lot of computation seen in the field of compressed sensing and compression in general.
Note
The driving factors for compression are audio, image and video, but also raw data compression as seen with zip, 7z and all the other available algorithms.
It is wrong to assume that we do not see this in engineering applications. High dimensional differential equations usually have a solution on a low dimensional manifolds and therefore imply that sparsity can be seen here to.
Figure 12.2: Upper left quarter of the MCI I image as a 3D surface.
As can be seen in Figure 12.2 we can express the clouds with view modes and even the raster of the building seams to fit this model nicely.
It is not very surprising to have such structure in a so called natural image. The image or pixel space is big, very big. For an \(n \times n\) black and white image there are \(2^{n^2}\) possible images. So for a \(20 \times 20\) image we already have \(2^{400}\) possible images, a number with 121 digits and it is assumed that there are (only) about \(10^{82}\) atoms in the universe.
Figure 12.3: Illustration to show the fastness of pixel (image) space in comparison to images we can make sense of aka natural images.
So finding structure in images, especially in images with high resolution is not surprising. The rest is basically random noise. Most of the dimensions of pixel space are only necessary if we want to encode this random noise images but for a cloud with some mountains and a building we only need some dimensions and are therefore highly compressible.
Brunton, Steven L., and J. Nathan Kutz. 2022. Data-Driven Science and Engineering - Machine Learning, Dynamical Systems, and Control. 2nd ed. Cambridge: Cambridge University Press.