4  Singular Value Decomposition

The Singular Value Decomposition (SVD) is an orthogonal matrix reduction. In contrast to the previously discussed Eigendecomposition it can be applied to rectangular matrices. As we will see the interpretation of the SVD can be linked to the eigendecomposition but first, lets provide a proper definition.

Theorem 4.1 (Singular Value Decomposition) If \(A \in \mathbb{R}^{m\times n}\) (a real \(m \times n\) matrix), there exists orthogonal matrices \[ U = \left[u_1 | \cdots | u_m \right] \in \mathbb{R}^{m\times m} \quad \text{and} \quad V = \left[v_1 | \cdots | v_n \right] \in \mathbb{R}^{n\times n} \] such that \[ U^{\mathsf{T}} A V = \Sigma = \operatorname{diag}(\sigma_1, \ldots, \sigma_p) \in \mathbb{R}^{m\times n}, \quad p=\min\{m, n\}, \tag{4.1}\] where \(\sigma_1 \geq \sigma_2 \geq \ldots \sigma_p \geq 0\).

(Compare Golub and Van Loan 2013, 76–80)

Note

We call \(\sigma_i\) a singular value of \(A\), the \(u_i\) are called the left singular vectors of \(A\) and \(v_i\) the right singular vectors of \(A\). Furthermore, \(\sigma_{max} (A)\) is the largest singular value of \(A\) and \(\sigma_{min} (A)\) is the smallest singular value of \(A\).

Instead of providing a concise proof (if you are interested see (Golub and Van Loan 2013, 76)) we show a possible motivation of the definition.

If we compute the eigendecompositions of \(AA^{\mathsf{T}}\) and \(A^{\mathsf{T}}A\). The two matrices have the same positive eigenvalues - the squares of the eigenvalues of \(A\) and we get

\[ \begin{array}{lll} (A A^{\mathsf{T}})U &= & U(\Lambda\Lambda^{\mathsf{T}}),\\ (A^{\mathsf{T}} A)V &= & V(\Lambda^{\mathsf{T}}\Lambda). \end{array} \] For \(A \in \mathbb{R}^{m\times n}\) with \(m>n\) we get \[ \Lambda = \left[ \begin{array}{c} \tilde{\Lambda}\\0\end{array} \right] \] with the diagonal matrix \(\tilde\Lambda\in\mathbb{R}^{n\times n}\) and \[ \begin{array}{llclc} \Lambda\Lambda^{\mathsf{T}} & = & \left[ \begin{array}{cc}\tilde{\Lambda}^2 & 0 \\0 & 0 \end{array} \right] & = & \left[ \begin{array}{cc}\tilde{\Sigma} & 0 \\0 & 0 \end{array} \right], \\ \Lambda^{\mathsf{T}}\Lambda & = & \tilde{\Lambda}^2 &= &\tilde{\Sigma}. \end{array} \] If we expand the matrices with zeros to match the correct dimensions this corresponds to our singular value decomposition \[ A = U \Sigma V^{\mathsf{T}}. \]

Important

The singular value decomposition always exists, is real and all singular vectors are positive.

Again, to visualize the composition it helps to better understand what is happening.

Figure 4.1: Singular Value Decomposition

In the case that \(m\geq n\), we can save storage by reducing the matrices \(U\) and \(\Sigma\), to their counterpart \(U_1\) and \(\Sigma_1\) by removing the zeros.

Definition 4.1 (Thin SVD) If \(A \in \mathbb{R}^{m\times n}\) for \(m\geq n\), then \[ A = \tilde{U} \tilde{\Sigma} V^{\mathsf{T}} \] where \[ U_1 = U(:, 1:n) = \left[u_1 | \cdots | u_n \right] \in \mathbb{R}^{m\times n} \] and \[ \tilde{\Sigma} = \Sigma(1:n, 1:n) = \operatorname{diag}(\sigma_1, \ldots, \sigma_n) \in \mathbb{R}^{n\times n}. \]

(Compare Golub and Van Loan 2013, 80)

Figure 4.2: Thin SVD

In Python we can compute the (Thin)SVD as follows

import numpy as np
import numpy.linalg as LA

A = np.array([[3, 2, 2], 
              [2, 3, -2]
              ])

U, s, Vh = LA.svd(A, full_matrices=True)
print(f"{U.shape=}, {s.shape=}, {Vh.shape=}")
print(f"{U=}")
print(f"{U @ U.transpose()=}\n")

print(f"{np.diag(s)=}\n")

print(f"{Vh=}")
print(f"{Vh.transpose() @ Vh=}\n")

A1 = U[:, :len(s)] @ np.diag(s) @ Vh[:len(s),:]
print(f"{np.allclose(A, A1)=}")
print(f"{A-A1=}\n")

S = np.zeros(A.shape)
S[:len(s), :len(s)] = np.diag(s)
A2 = U @ S @ Vh
np.allclose(A, A2)
print(f"{A-A2=}\n")

print("\nThin SVD:")
U, s, Vh = LA.svd(A, full_matrices=False)
print(f"{U.shape=}, {s.shape=}, {Vh.shape=}")
print(f"{np.allclose(A, U @ np.diag(s) @ Vh)=}")

print("\nTransposed matrix:")
B = A.transpose()
U, s, Vh = LA.svd(B, full_matrices=False)
print(f"{U.shape=}, {s.shape=}, {Vh.shape=}")
print(f"{np.allclose(B, U @ np.diag(s) @ Vh)=}")
U.shape=(2, 2), s.shape=(2,), Vh.shape=(3, 3)
U=array([[-0.70710678, -0.70710678],
       [-0.70710678,  0.70710678]])
U @ U.transpose()=array([[1.00000000e+00, 1.53686518e-16],
       [1.53686518e-16, 1.00000000e+00]])

np.diag(s)=array([[5., 0.],
       [0., 3.]])

Vh=array([[-7.07106781e-01, -7.07106781e-01, -6.47932334e-17],
       [-2.35702260e-01,  2.35702260e-01, -9.42809042e-01],
       [-6.66666667e-01,  6.66666667e-01,  3.33333333e-01]])
Vh.transpose() @ Vh=array([[ 1.00000000e+00,  1.35693925e-16,  1.32609972e-16],
       [ 1.35693925e-16,  1.00000000e+00, -4.93432455e-17],
       [ 1.32609972e-16, -4.93432455e-17,  1.00000000e+00]])

np.allclose(A, A1)=True
A-A1=array([[-8.88178420e-16, -4.44089210e-16, -8.88178420e-16],
       [-4.44089210e-16,  0.00000000e+00, -6.66133815e-16]])

A-A2=array([[-8.88178420e-16, -4.44089210e-16, -8.88178420e-16],
       [-4.44089210e-16,  0.00000000e+00, -6.66133815e-16]])


Thin SVD:
U.shape=(2, 2), s.shape=(2,), Vh.shape=(2, 3)
np.allclose(A, U @ np.diag(s) @ Vh)=True

Transposed matrix:
U.shape=(3, 2), s.shape=(2,), Vh.shape=(2, 2)
np.allclose(B, U @ np.diag(s) @ Vh)=True

4.1 Low rank approximation

Again, we can cut of the reconstruction at a certain point and create an approximation. More formally this is defined in the next definition.

Definition 4.2 (Low-Rank Approximation) If \(A \in \mathbb{R}^{m\times n}\) and has the SVD \(A = U\Sigma V^{\mathsf{T}}\) than \[ A_k = U(:, 1:k)\, \Sigma(1:k, 1:k)\, V^{\mathsf{T}}(1:k, :) \] is the optimal low-rank approximation of \(A\) with rank \(k\). This is often called the truncated SVD.

(See Golub and Van Loan 2013, Corollary 2.4.7 p. 79)

Truncated SVD

Truncated SVD

We can use this for image compression.

Show the code for the figure
import matplotlib.pyplot as plt
import imageio.v3 as iio
import numpy as np
import numpy.linalg as LA
%config InlineBackend.figure_formats = ['svg']

im = np.asarray(iio.imread(
    "https://www.mci.edu/en/download/27-logos-bilder?"
    "download=618:mci-eu-web"))

def rgb2gray(rgb):
    return np.dot(rgb[..., :3], [0.2989, 0.5870, 0.1140])

im_gray = rgb2gray(im)
im_scale = im_gray[1500:3001, 1500:3001] / 255

U, s, Vh = LA.svd(im_scale, full_matrices=False)

rec = [1/1000, 10/100, 25/100, 50/100, 75/100]

plt.figure()
plt.plot(s)
plt.yscale("log")
plt.ylabel(r"$|\sigma_i|$")
plt.xlabel("index")
plt.gca().set_aspect(5e1)

for p in rec:
    plt.figure()
    r = int(np.ceil(len(s) * p))
    A_r = U[:, :r] @ np.diag(s[:r]) @ Vh[:r, :]
    plt.imshow(A_r, cmap=plt.get_cmap("gray"))
    plt.gca().set_axis_off()

plt.figure()
plt.imshow(im_scale, cmap=plt.get_cmap("gray"))
plt.gca().set_axis_off()
plt.show()
(a) Singular values from largest to smallest
(b) 0.1% of eigenvalue
(c) 10% of eigenvalue
(d) 25% of eigenvalue
(e) 50% of eigenvalue
(f) 75% of eigenvalue
(g) original image
Figure 4.3: Image of MCI I and the reconstruction with reduced rank matrices.
Note

If we compare this to Figure 3.4 we can see that we get a much better result for smaller \(r\). Let us have a look why.

As the matrices \(U\) and \(V\) are orthogonal, they also define a basis of the corresponding (sub) vector spaces. As mentioned before, the SVD automatically selects these and they are optimal.

Consequently, the matrices \(U\) and \(V\) can be understood as reflecting patterns in the image. We can think of the columns of \(U\) and \(V\) as the vertical respectively horizontal patterns of \(A\).

We can illustrate this by looking at the modes of our decomposition \[ M_k = U(:, k) V^{\mathsf{T}}(k, :). \]

Show the code for the figure
import matplotlib.pyplot as plt
import numpy as np
import numpy.linalg as LA
%config InlineBackend.figure_formats = ['svg']

rec = [0, 1, 2, 3, 4, 5]

for r in rec:
    plt.figure()
    M_r = np.outer(U[:, r], Vh[r, :])
    plt.imshow(M_r, cmap=plt.get_cmap("gray"))
    plt.gca().set_axis_off()

plt.show()
(a) First mode
(b) Second mode
(c) Third mode
(d) Fourth mode
(e) Fifth mode
(f) Sixth mode
Figure 4.4: Modes of the SVD decomposition of the MCI I image.
Note

The big advantage here is, that the selection is optimal. A disadvantage is that the need to store the basis separately and this increases the necessary storage. We will see in later sections about wavelets and Fourier decomposition how a common basis can be used to reduce the storage by still keeping good reconstructive properties.

4.2 Principal Component Analysis

On of the most important applications of SVD is in the stable computation of the so called principal component analysis (PCA). It is a common technique in data exploration, analysis, visualization, and preprocessing.

The main idea of PCM is to transform the data in such a way that the main directions (principal components) capture the largest variation. In short we perform a change of the basis, see Definition 1.8.

Let us investigate this in terms of a (artificial) data set.

Important

This example is adapted from (Brunton and Kutz 2022, Example: Noisy Gaussian Data, pp. 25-27).

We generate a noisy cloud (see Figure 4.5) that consists of \(10000\) points in 2D, generated from a normal distribution with zero mean and unit variance. The data is than:

  1. scaled by \(2\) in the first direction and by \(\frac12\) in second,
  2. rotated by \(\frac\pi3\)
  3. translation in the direction \(\left[2\ 1\right]^{\mathsf{T}}\).

The resulting matrix \(X\) is a long and skinny matrix with each measurement (or experiment) stacked next to each other. This means, each column represents a new set, e.g. a time step, and each row corresponds to the same sensor.

Important

The following code is a slight adaptation (for nicer presentation in these notes) of the (Brunton and Kutz 2022, Code 1.4) also see notebook on github.

Show the code for the figure
import matplotlib.pyplot as plt
import numpy as np
%config InlineBackend.figure_formats = ["svg"]
np.random.seed(6020)       # Make sure to stay reproducible

xC = np.array([2, 1])      # Center of data (mean)
sig = np.array([2, 0.5])   # Principal axes

theta = np.pi / 3            # Rotate cloud by pi/3

R = np.array([[np.cos(theta), -np.sin(theta)],     # Rotation matrix
              [np.sin(theta), np.cos(theta)]])

nPoints = 10000            # Create 10,000 points
X = R @ np.diag(sig) @ np.random.randn(2, nPoints) + \
        np.diag(xC) @ np.ones((2, nPoints))

fig = plt.figure()
ax1 = fig.add_subplot(121)
ax1.plot(X[0, :], X[1, :], ".", color="k")
ax1.grid()
ax1.set_aspect("equal")
plt.xlim((-6, 8))
plt.ylim((-6, 8))

## f_ch01_ex03_1b

Xavg = np.mean(X, axis=1)                  # Compute mean
B = X - np.tile(Xavg, (nPoints, 1)).T       # Mean-subtracted data

# Find principal components (SVD)
U, S, VT = np.linalg.svd(B, full_matrices=False)
S = S / np.sqrt(nPoints - 1)

ax2 = fig.add_subplot(122)
ax2.plot(X[0, :], X[1, :], ".", color="k")   # Plot data to overlay PCA
ax2.grid()
ax2.set_aspect("equal")
plt.xlim((-6, 8))
plt.ylim((-6, 8))

theta = 2 * np.pi * np.arange(0, 1, 0.01)

# 1-std confidence interval
Xstd = U @ np.diag(S) @ np.array([np.cos(theta), np.sin(theta)])

ax2.plot(Xavg[0] + Xstd[0, :], Xavg[1] + Xstd[1, :],
        "-", color="r", linewidth=3)
ax2.plot(Xavg[0] + 2 * Xstd[0, :], Xavg[1] + 2 * Xstd[1, :],
        "-", color="r", linewidth=3)
ax2.plot(Xavg[0] + 3 * Xstd[0, :], Xavg[1] + 3 * Xstd[1, :],
        "-", color="r", linewidth=3)

# Plot principal components U[:,0]S[0] and U[:,1]S[1]
ax2.plot(np.array([Xavg[0], Xavg[0] + U[0,0] * S[0]]),
         np.array([Xavg[1], Xavg[1] + U[1,0] * S[0]]),
         "-", color="cyan", linewidth=5)
ax2.plot(np.array([Xavg[0], Xavg[0] + U[0,1] * S[1]]),
         np.array([Xavg[1], Xavg[1] + U[1,1] * S[1]]),
         "-", color="cyan", linewidth=5)

plt.show()
Figure 4.5: Principal components of the mean-subtracted Gaussian data on the left as, as well as the first three standard deviation ellisoids and the two scaled left singular vectors.

4.2.1 Computation

For the computation we follow the outline given in (Brunton and Kutz 2022, chap. 1.5). First we need to center our matrix \(X\) according to the mean per feature, in our case per row. \[ \overline{x}_j = \frac1n \sum_{i=1}^n X_{ij} \] and our mean matrix is the outer product with the one vector \[ \overline{X} = \left[\begin{array}{c}1\\\vdots\\1\end{array}\right] \overline{x} \] which can be used to compute the centred matrix \(B = X - \overline{X}\).

The PCA is the eigendecomposition of the covariance matrix \[ C = \frac{1}{n-1} B^{\mathsf{T}} B \tag{4.2}\]

Note

The normalization factor of \(n-1\) in Equation 4.2 an not \(n\) is called Bassel’s correction and compensates for the bias in the estimation of the population variance.

As \(C\) is symmetric and positive semi-definite, therefore it has non-negative real eigenvalues and the matrix \(V\) of the eigendecomposition satisfies \(V^{-1} = V^{\mathsf{T}}\) (i.e. it is orthogonal Definition 1.10). The principal components are the eigenvectors and the eigenvalue are the variance along these components.

If we instead compute the SVD of \(B = U\Sigma V^{\mathsf{T}}\) we get \[ C = \frac{1}{n-1} B^{\mathsf{T}}B = \frac{1}{n-1} V \Sigma V^{\mathsf{T}} = \frac{1}{n-1} V (\Lambda^{\mathsf{T}}\Lambda) V^{\mathsf{T}} \] leading to a way of computing the principal components in a robust way as \[ \lambda_k = \frac{\sigma_k^2}{n-1}. \]

Tip

If the sensor ranges of our matrix are very different in magnitude the correlation matrix is scaled by the row wise standard deviation of \(B\) similar as for the mean.

In our example we get our scaled \(\sigma_1=1.988\approx 2\) and \(\sigma_2=0.496\approx \frac12\). These results recover our given parameters very well. Additionally we can see that our rotation matrix is closely matched by \(U\) (up to signs) from our SVD: \[ R_{\frac\pi3} = \left[ \begin{array}{cc} 0.5&0.866\\-0.866&0.5\end{array} \right], \quad U = \left[ \begin{array}{cc}-0.501&-0.865\\-0.865&0.501\end{array} \right] \]

4.2.2 Example Eigenfaces

We combine SVD/PCA in a illustrative example called eigenfaces as introduced in (Brunton and Kutz 2022, Sec 1.6, pp. 28-34).

The idea is to apply the PCA techniques to a large set of faces to extract the dominate correlations between the images and create a face basis that can be used to represent an image in these coordinates. For example you can reconstruct a face in this space by projecting onto the eigen vectors or it can be used for face recognition as similar faces usually cluster under this projection.

The images are taken from the Yale Face Dataset B, in our case we use a GitHub that provides Julia Pluto notebooks for Chapter 1 to 4 of Brunton and Kutz (2022).

Our training set, so to speak, consists of the first 36 people in the dataset. We compute the average face and subtract it from our dataset to get our matrix \(B\). From here a SVD provides us with our basis \(U\). To test our basis we use individual 37 and a portion of the image of the MCI Headquarter (to see how well it performs on objects). For this we use the projection \[ \tilde{x} = U_r U_r^{\mathsf{T}} x. \] If we split this up, we first project onto our found patterns (encode) and than reconstruct from them (decode).

Note

We can understand this as encoding and decoding our test image, which is the general setup of an autoencoder (a topic for another lecture).

The correlation coefficients \(x_r = U_r^{\mathsf{T}} x\) might reveal patterns for different \(x\). In the case of faces, we can use this for face recognition, i.e. if the coefficients of \(x_r\) are in the same cluster as other images, they are probably from the same person.

Important

The following code is an adaptation of the (Brunton and Kutz 2022, Code 1.7 and 1.9).

Show the code for the figure
import numpy as np
import numpy.linalg as LA
import scipy
import requests
import io
import matplotlib.pyplot as plt
%config InlineBackend.figure_formats = ["svg"]

response = requests.get(
    "https://github.com/frankhuettner/Data_Driven_Science_Julia_Demos"
    "/raw/refs/heads/main/DATA/allFaces.mat")

data = scipy.io.loadmat(io.BytesIO(response.content))
faces = data["faces"]
m = int(data["m"][0,0])
n = int(data["n"][0,0])
nfaces = np.ndarray.flatten(data['nfaces'])

trainingFaces = faces[:, : np.sum(nfaces[:36])]
avgFace = np.mean(trainingFaces, axis=1)

B = trainingFaces - np.tile(avgFace, (trainingFaces.shape[1], 1)).T
U, _, _ = LA.svd(B, 'econ')

testFace = faces[:, np.sum(nfaces[:36])]
testFaceMS = testFace - avgFace
rec = [25, 100, 400]

fig = plt.figure()
axs = [] 
axs.append(fig.add_subplot(2, 4, 1))
axs.append(fig.add_subplot(2, 4, 2))
axs.append(fig.add_subplot(2, 4, 3))
axs.append(fig.add_subplot(2, 4, 4))
axs.append(fig.add_subplot(2, 4, 5))
axs.append(fig.add_subplot(2, 4, 6))
axs.append(fig.add_subplot(2, 4, 7))
axs.append(fig.add_subplot(2, 4, 8))

for i, p in enumerate(rec):
    r = p
    A_r = avgFace + U[:, :r] @ U[:, :r].T @ testFaceMS
    axs[i].imshow(np.reshape(A_r, (m, n)).T, cmap=plt.get_cmap("gray"))
    axs[i].set_axis_off()
    axs[i].set_title(f"${r=}$")

axs[3].imshow(np.reshape(testFace, (m, n)).T, cmap=plt.get_cmap("gray"))
axs[3].set_axis_off()
axs[3].set_title(f"Original image")

shift = 1500
testFaceMS = np.reshape(im_gray[shift:shift+n, shift:shift+m].T, n*m) - \
             avgFace
rec = [100, 400, 1600]
for i, p in enumerate(rec):
    r = p
    A_r = avgFace + U[:, :r] @ U[:, :r].T @ testFaceMS
    axs[4 + i].imshow(np.reshape(A_r, (m, n)).T, cmap=plt.get_cmap("gray"))
    axs[4 + i].set_axis_off()
    axs[4 + i].set_title(f"${r=}$")
axs[7].imshow(im_gray[shift:shift+n, shift:shift+m], cmap=plt.get_cmap("gray"))
axs[7].set_axis_off()
axs[7].set_title(f"Original image")
plt.show()
Figure 4.6: Approximate reconstruction of a test face and an object using the eigenfaces basis for different order r.
Note

Due to resource limitations the above computation can not be done for each build. We try to make sure that the code matches the image but if something is different if you try it yourself we apologise for that.

4.3 Further applications of the SVD

There are many more applications of the SVD but we want to highlight some regarding systems of linear equations, \[ A x = b \tag{4.3}\] where the matrix \(A\), as well as the vector \(b\) is known an \(x\) is unknown.

Depending on the structure of \(A\) and the specific \(b\) we have no, one, or infinitely many solutions. For now the interesting case is where \(A\) is rectangular and therefore we have either an

  • under-determined system \(m\ll n\), so more unknowns than equations,
  • over-determined system \(m\gg n\), so more equations than unknowns.

For the second case (more equations than unknowns) we often switch to solving the optimization problem that minimizes \[ \|Ax-b\|_2^2. \tag{4.4}\] This is called the least square solution. The least square solution will also minimize \(\|Ax-b\|_2\). For an under-determined system we might seek the solution which minimizes \(\|x\|_2\) called the minimum norm solution.

If we us the SVD decomposition for \(A = U \Sigma V^{\mathsf{T}}\) we can define the following

Definition 4.3 (Pseudo-inverse) We define the matrix \(A^\dagger \in \mathbb{R}^{m\times n}\) by \(A^\dagger = V\Sigma^\dagger U^{\mathsf{T}}\) where \[ \Sigma^\dagger = \operatorname{diag}\left(\frac{1}{\sigma_1}. \frac{1}{\sigma_2}, \ldots, \frac{1}{\sigma_r}, 0, \ldots, 0\right) \in \mathbb{R}^{m\times n}, \quad r=\operatorname{rank}(A). \]

The matrix \(A^\dagger\) is often called the Moore-Penrose left pseudo-inverse as it fulfils the Moore-Penrose conditions conditions. It is also the matrix that attains the minimal Frobenius norm solution to \[ \min_{X \in \mathbb{R}^{m\times n}}\| A X - I_n\|_F. \]

(Compare Golub and Van Loan 2013, 290)

If we only use the truncated version, i.e. where we only use non-zero singular values, we can use it to find good solutions to Equation 4.4.

In numpy it can be computed by numpy.linalg.pinv.

Definition 4.4 (Condition number) The condition number of a matrix provides a measure how sensitive the solution of Equation 4.3 is to perturbations in \(A\) and \(b\). For a square matrix \(A\) the condition number is defined as \[ \kappa(A) = \|A\| \left\|A^{-1}\right\|, \] for an appropriate underlying norm. For the 2-norm \(\kappa_2\) is \[ \kappa_2(A) = \|A\|_2 \left\|A^{-1}\right\|_2 = \frac{\sigma_{max}}{\sigma_{min}}. \]

To get a better idea on what this means think of it in this way. For the perturbed linear system \[ A(x + \epsilon_x) = b + \epsilon_b, \] we can outline the worst case, where \(\epsilon_x\) aligns with the singular vector of the largest singular vector and \(x\) with the smallest singular value, i.e. \[ A(x + \epsilon_x) = \sigma_{min}x + \sigma_{max}\epsilon_x. \] Consequently, the output signal-to-noise \(\|b\|/\epsilon_b\) is equivalent with the input signal-to-noise \(\|x\|/\epsilon_x\) and the factor between those two is \(\kappa_2(A)\).

In this sense \(\kappa_2\) can be extended for more general matrices.

(Compare Golub and Van Loan 2013, 87; and Brunton and Kutz 2022, 18–19)

4.3.1 Linear regression with SVD

Before we go into more details about regression in the next section we give a brief outlook in terms of how to solve such a problem with SVD.

Important

This example is adapted from (Brunton and Kutz 2022, Example: One-Dimensional Linear Regression, Example: Cement Heat Generation Data, pp. 19-22).

4.3.1.1 Linear Regression (see Brunton and Kutz 2022, 19–21)

First we just take a linear correlation that we augment with some Gaußian Noise. So our matrix \(A\) is simple a vector with our \(x\)-coordinates and \(b\) is the augmented image under our linear correlation. \[ \left[ \begin{array}{c} a_1 \\ a_2 \\ \vdots \\ a_n \end{array} \right]x = \left[ \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_n \end{array} \right] \quad \Leftrightarrow \quad U\Sigma V^{\mathsf{T}} x = b \quad \Leftrightarrow \quad x = A^\dagger b \]

For this example \(\Sigma = \|a\|_2\), \(V=1\), and \(U=\tfrac{a}{\|a\|_2^2}\). This is basically just the projection of \(b\) along our basis \(a\) and this is \[ x = \frac{a^{\mathsf{T}} b}{a^{\mathsf{T}} a}. \]

Show the code for the figure
import matplotlib.pyplot as plt
import numpy as np
import numpy.linalg as LA
%config InlineBackend.figure_formats = ['svg']
np.random.seed(6020)       # Make sure to stay reproducible

k = 3
A = np.arange(-2, 2, 0.25).reshape(-1, 1)
b = k*A + np.random.randn(*A.shape) * 0.5

U, s, VT = LA.svd(A, full_matrices=False)

x = VT.T @ np.diag(1/s) @ U.T @ b

plt.plot(A, k*A, color="k", label="Target")
plt.plot(A, b, 'x', color="r", label="Noisy data")
plt.plot(A, A*x, '--', color="b", label="Regression line")
plt.legend()
plt.xlabel("$a$")
plt.ylabel("$b$")
plt.show()
Figure 4.7: Linear regression with SVD.

Our reconstructed unknown \(x=\) 2.855 and is a reasonable good match for \(k=\) 3.0

4.3.1.2 Multi-Linear Regression (see Brunton and Kutz 2022, 21–23)

The second example is based on the Portland Cement Data build in with MATLAB. In Python we again use the dataset provided on GitHub. The data set contains the heat generation during the hardening of 12 cement mixtures comprised of 4 basic ingredients, i.e. \(A\in \mathbb{R}^{13\times 4}\). The aim is to determine the weights \(x\) that relate the proportion of the ingredients to the heat generation in the mixture.

Show the code for the figure
import matplotlib.pyplot as plt
import numpy as np
import numpy.linalg as LA
import requests
import io
%config InlineBackend.figure_formats = ['svg']

# Transform the content of the file into a numpy.ndarray
response = requests.get(
    "https://github.com/frankhuettner/Data_Driven_Science_Julia_Demos"
    "/raw/refs/heads/main/DATA/hald_ingredients.csv")
# Transform the content of the file into a numpy.ndarray
A = np.genfromtxt(io.BytesIO(response.content), delimiter=",")

response = requests.get(
    "https://github.com/frankhuettner/Data_Driven_Science_Julia_Demos"
    "/raw/refs/heads/main/DATA/hald_heat.csv")
b = np.genfromtxt(io.BytesIO(response.content), delimiter=",")

U, s, VT = LA.svd(A, full_matrices=False)

x = VT.T @ np.diag(1/s) @ U.T @ b

plt.plot(b, color="k", label="Target - Heat data")
plt.plot(A@x, '--', color="b", label="Regression")
plt.legend()
plt.xlabel("mixture")
plt.ylabel("Heat[cal/g]")
plt.show()
Figure 4.8: Estimate for hardening in cement mixtures.

This concludes our investigation of matrix decompositions, we will investigate further decompositions of signals later, but for now we dive deeper into regression.