12  Sparsity and Compression

As we have seen before and maybe know from our own experience, most image and audio signals are highly compressible. Here compressible means we can find a basis that allows for a sparse reprehension of our signal. Let us put this into a small definition.

Definition 12.1 (\(K\)-sparse data) A compressible signal \(x\in \mathbb{R}^n\) may be written in a sparse vector \(s\in \mathbb{R}^n\) with a basis transformation (see Definition 1.8 for the formal definition) expressed by the matrix \(B\in\mathbb{R}^{n\times n}\) and \[ x = B s. \] The vector \(s\) is called \(K\)-sparse if it contains exactly \(K\) non-zero elements.

Important

It is important to note that this does not imply that \(B\) is sparse.

If the basis is generic such as the Fourier or Wavelet basis, i.e. we do not need to store the matrix \(B\) but can generate it on the spot, we only need to store the \(K\) non-zero elements of \(s\) to reconstruct the original signal.

Note

As we have seen in Chapter 11 images (and audio) signals are highly compressible in the Fourier and Wavelet basis with most entries small or zero. By setting the small values to zero we reduce the density further without a high loss of quality.

We see this in JPEG compression for images and MP3 compression for audio signals. If we stream an audio signal or view an image on the web we only need to transfer \(s\) and not \(x\), saving bandwidth and storage as we go.

We have seen in Figure 4.3 that we can use the SVD to reduce the size as well. The downside here is that we need to store \(U\) and \(V\) (Definition 4.2) even if we reduce the rank. This is rather inefficient. On the other hand, we have used SVD in Section 4.2.2 with the Eigenfaces example how we can create a basis with SVD that can be used to classify an entire class of images - human faces. Storing the basis matrices in this case is comparable cheap and it allows us to use certain aspects of the downsampling for learning purposes.

We also need to stress that SVD and Fourier are unitary transformations which make the move into and from the basis cheep. This is the basis for a lot of computation seen in the field of compressed sensing and compression in general.

Note

The driving factors for compression are audio, image and video, but also raw data compression as seen with zip, 7z and all the other available algorithms.

It is wrong to assume that we do not see this in engineering applications. High dimensional differential equations usually have a solution on a low dimensional manifolds and therefore imply that sparsity can be seen here to.

Let us return to image compression and follow along with the examples given in (Brunton and Kutz 2022, chaps. 3, pp98–101).

We can use the code provided earlier. We move from Figure 12.1 (a) to Figure 12.1 (b) via \(\mathcal{F}\). From Figure 12.1 (b) to Figure 12.1 (d) we only keep the highest 5% of our values and move to Figure 12.1 (c) via \(\mathcal{F}^{-1}\).

Show the code for the figure
import matplotlib.pyplot as plt
import imageio.v3 as iio
import numpy as np
%config InlineBackend.figure_formats = ['svg']

im = np.asarray(iio.imread(
    "https://www.mci.edu/en/download/27-logos-bilder?"
    "download=618:mci-eu-web"))

def rgb2gray(rgb):
    return np.dot(rgb[...,:3], [0.2989, 0.5870, 0.1140])

def myplot(A):
    plt.figure()
    plt.imshow(A, cmap="gray")
    plt.axis("off")
    plt.gca().set_aspect(1)

A = rgb2gray(im)
A_fft = np.fft.fft2(A)
A_fft_sort = np.sort(np.abs(A_fft.reshape(-1)))
myplot(A)

myplot(np.log(np.abs(np.fft.fftshift(A_fft)) + 1))
c = 0.05
thresh = A_fft_sort[int(np.floor((1 - c) * len(A_fft_sort)))]
A_fft_th = A_fft * (np.abs(A_fft) > thresh)
A_th = np.fft.ifft2(A_fft_th).real
myplot(A_th)
myplot(np.log(np.abs(np.fft.fftshift(A_fft_th)) + 1))
(a) Original image.
(b) Fourier coefficients
(c) Compressed image
(d) Truncated FFT coefficients (5%)
Figure 12.1: Image of MCI I and the reconstruction with various amounts of FFT coefficients left.

In order to get an idea why the Fourier transform is useful in this scenario we look at the image as a surface.

Note

In order to make this feasible for interactive rendering we use only the upper left quarter of the image.

Show the code for the figure
import plotly.graph_objects as go
import numpy as np
%config InlineBackend.figure_formats = ['svg']

B = np.transpose(A[:int(A.shape[0]/2):5, :int(A.shape[1]/2):5])
y = np.arange(B.shape[0])
x = np.arange(B.shape[1])

X,Y = np.meshgrid(x,y)
fig = go.Figure()
fig.add_trace(go.Surface(z=B, x=X, y=Y))
fig.show()
Figure 12.2: Upper left quarter of the MCI I image as a 3D surface.

As can be seen in Figure 12.2 we can express the clouds with view modes and even the raster of the building seams to fit this model nicely.

It is not very surprising to have such structure in a so called natural image. The image or pixel space is big, very big. For an \(n \times n\) black and white image there are \(2^{n^2}\) possible images. So for a \(20 \times 20\) image we already have \(2^{400}\) possible images, a number with 121 digits and it is assumed that there are (only) about \(10^{82}\) atoms in the universe.

Figure 12.3: Illustration to show the fastness of pixel (image) space in comparison to images we can make sense of aka natural images.

So finding structure in images, especially in images with high resolution is not surprising. The rest is basically random noise. Most of the dimensions of pixel space are only necessary if we want to encode this random noise images but for a cloud with some mountains and a building we only need some dimensions and are therefore highly compressible.