3 Eigendecomposition

We start off with the so called eigendecomposition. The main idea is to compute a decomposition (or factorization) of a square matrix into a canonical form. We will represent a matrix by its eigenvalues and eigenvectors. In order to do so we need to recall some more definitions from our favourite linear algebra book, such as Golub and Van Loan (2013).

Definition 3.1 (Determinant) If \(A = (a) \in \mathbb{R}^{1\times1}\) (a single number), then its determinant is given by \(\det(A) = a\). Now, the determinant of \(A \in \mathbb{R}^{n\times n}\) is defined in terms of order-\((n-1)\) determinants: \[ \det(A) = \sum_{j=1}^n (-1)^{j+1} a_{1j}\det(A_1j), \] where \(A_1j\) is a \((n-1) \times (n-1)\) matrix obtained by deleting the first row and the \(j\)th column of \(A\).

(Compare Golub and Van Loan 2013, 66–67)

Some important properties of the determinant are:

for two matrices \(A\) and \(B\) in \(\mathbb{R}^{n\times n}\) \[ \det(AB) = \det(A)\det(B), \]
for the transpose \(A^{\mathsf{T}}\) of a matrix \(A\) \[ \det(A^{\mathsf{T}}) = \det(A), \]
for a scalar \(\alpha\in\mathbb{R}\) and a matrix \(A\) \[ \det(\alpha A) = \alpha^n\det(A), \]
we call a matrix \(A\) nonsingular (i.e. invertible) if and only if \[ \det(A) \not= 0, \]
for an invertible matrix \(A^{-1}\) \[ \det\left(A^{-1}\right) = \frac{1}{\det(A)}, \]

So let us check in numpy:

import numpy as np
from numpy import linalg as LA

A = np.array([[2, 0, 0],
              [0, 3, 4],
              [0, 4, 9]])
det_A = LA.det(A)
print(f"{det_A=}")

A_T = np.transpose(A)
det_A_T = LA.det(A_T)
print(f"{det_A_T=}")

X = LA.inv(A)
det_X = LA.det(X)
print(f"{det_X=}")

print(f"{det_X*det_A=}")

det_Am = LA.det(-A)
print(f"{det_Am=}")

det_A=np.float64(21.999999999999996)
det_A_T=np.float64(21.999999999999996)
det_X=np.float64(0.04545454545454546)
det_X*det_A=np.float64(1.0)
det_Am=np.float64(-21.999999999999996)

Definition 3.2 (Eigenvalues) For a matrix \(A \in \mathbb{C}^{n\times n}\) the eigenvalues are the roots of the characteristic polynomial \[ p(\lambda) = \det(A - \lambda I). \] Consequently, every \(n\times n\) matrix has exactly \(n\) eigenvalues. Note that for a real matrix the eigenvalues might still be in \(\mathbb{C}\).

The set of all eigenvalues of \(A\) is denoted by \[ \lambda(A) = \left\{\lambda : \det(A-\lambda I)=0\right\}. \] If all of the eigenvalues are real (in \(\mathbb{R}\)) we can sort them from largest to smallest \[ \lambda_n(A) \leq \cdots \leq \lambda_2(A) \leq \lambda_1(A), \] and we denote the largest by \(\lambda_{max}(A) = \lambda_1(A)\) and the smallest by \(\lambda_{min}(A) = \lambda_n(A)\), respectively.

(Compare Golub and Van Loan 2013, 66–67)

From the above properties of the determinant we can conclude that, if \(X \in \mathbb{C}^{n\times n}\) is nonsingular and \(B = X^{-1} A X\), then \(A\) and \(B\) are called similar and two similar matrices have exactly the same eigenvalues.

lam = LA.eigvals(A)
lam_sort = np.sort(lam)

print(f"{lam_sort=}")

lam_sort=array([ 1.,  2., 11.])

Definition 3.3 (Eigenvector) If \(\lambda \in \lambda(A)\), then there exists a nonzero vector \(v\) so that \[ Av = \lambda v \iff (A - \lambda I)v = 0. \tag{3.1}\] Such a vector is called eigenvector of \(A\) associated with \(\lambda\).

Definition 3.4 (Eigendecomposition) If \(A \in \mathbb{C}^{n\times n}\) has \(n\) (linear) independent eigenvectors \(v_1, \ldots, v_n\) and \(Av_i=\lambda_i v_i\), for \(i=1:n\), than \(A\) is diagonalizable.

If we combine the eigenvectors to a matrix \[ V = \left[v_1 | \cdots | v_n\right], \] then \[ V^{-1} A V = \operatorname{diag}(\lambda_1, \ldots, \lambda_n) = \Lambda = \left[ \begin{array}{cccc} \lambda_1 & 0 & \dots & 0\\ 0 & \lambda_2 & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0\\ 0 & \dots & 0 & \lambda_n \\ \end{array} \right]. \] This is called the eigendecomposition of the matrix \(A\).

(Compare Golub and Van Loan 2013, 66–67)

Not all matrices \(A \in \mathbb{R}^{n\times n}\) are diagonalizable.

lam, V = LA.eig(A)
A_eigen = np.diag(lam)
print(f"{A_eigen=}")
print(f"{V=}")

v_1 = (A @ V[:, 0]) / lam[0]
print(f"{v_1=}")

A_eigen2 = LA.inv(V) @ A @ V
print(f"{A_eigen2=}")

print(f"{np.allclose(A_eigen, A_eigen2)=}")

A_eigen=array([[11.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  2.]])
V=array([[ 0.        ,  0.        ,  1.        ],
       [ 0.4472136 ,  0.89442719,  0.        ],
       [ 0.89442719, -0.4472136 ,  0.        ]])
v_1=array([0.        , 0.4472136 , 0.89442719])
A_eigen2=array([[1.10000000e+01, 0.00000000e+00, 0.00000000e+00],
       [5.55111512e-17, 1.00000000e+00, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 2.00000000e+00]])
np.allclose(A_eigen, A_eigen2)=True

We can use what we have learned about the basis of a vector space and the transformation between two bases, see Section 1.3 to get a different interpretation of the eigendecomposition.

We recall, if \(x\) is represented in the standard basis with matrix \(I\) we can change to the basis represented by \(V\) and therefore \(\hat{x} = V^{-1}x\).

Consequently, the equation \[ A = V \Lambda V^{-1} \] means that in the basis created by \(V\) the matrix \(A\) is represented by a diagonal matrix.

3.1 Examples for the application

To get a better idea what the eigendecomposition can do we look into some examples.

3.1.1 Solving system of linear equations

For a system of linear equations \(Ax=b\) we get \[ \begin{array}{lll} A x & = b & \iff \\ V \Lambda V^{-1} x & = b & \iff \\ \Lambda V^{-1} x & = V^{-1} b & \iff \\ V^{-1} x & = \Lambda^{-1}V^{-1} b & \iff \\ x & = V\Lambda^{-1}V^{-1} b \end{array} \] As \(\Lambda^{-1} = \operatorname{diag}\left(\lambda_1^{-1}, \ldots, \lambda_n^{-1}\right)\) this is easy to compute once we have the eigenvalue decomposition.

Note

The computation of the eigendecomposition is not cheap, therefore this is not always worth the effort and there are other ways of solving linear systems.

3.1.2 Linear Ordinary Differential Equations

In this example we use the eigendecomposition to efficiently solve a system of differential equations \[ \dot{x} = A x,\quad x(0) = x_0 \] By changing to the basis \(V\) and using the notation \(\hat{x}=z\) we have the equivalent formulations \[ z = V^{-1}x \iff x = Vz, \] and if follows \[ \begin{array}{lll} \dot{x} = A x & \iff V \dot{z} &= A V z \\ & \iff \dot{z} &= V^{-1} A V z \\ & \iff \dot{z} &= \Lambda z \end{array}. \] So for an initial value \(z_0\) the solution in \(t\) is \[ z(t) = \operatorname{diag}\left(e^{t\lambda_1}, \ldots, e^{t\lambda_n}\right) z_0. \]

We often say that it is now a decoupled differential equation.

3.1.3 Higher Order Linear Differential Equations

If we have a higher order linear ODE such as \[ x^{(n)} + a_{n-1} x^{(n-1)} + \cdots + a_2 \ddot{x} + a_1 \dot{x} + a_0 x = 0. \tag{3.2}\] we can stack the derivatives into a vector \[ \begin{array}{ccc} x_1 & = & x\\ x_2 & = & \dot{x}\\ x_3 & = & \ddot{x}\\ \vdots & = & \vdots \\ x_{n-1} & = & x^{(n-2)} \\ x_{n} & = & x^{(n-1)} \\ \end{array} \quad \Leftrightarrow \quad \left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ \vdots \\ x_{n-1} \\ x_n \end{array} \right] = \left[ \begin{array}{c} x \\ \dot{x} \\ \ddot{x} \\ \vdots \\ x^{(n-2)} \\ x^{(n-1)} \end{array} \right], \] and taking the derivative of this vector yields the following system \[ \underbrace{ \frac{d}{d t} \left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ \vdots \\ x_{n-1} \\ x_n \end{array} \right] }_{\dot{x}} = \underbrace{\left[ \begin{array}{cccccc} 0 & 1 & 0 & \dots & 0 & 0\\ 0 & 0 & 1 & \dots & 0 & 0\\ 0 & 0 & 0 & \dots & 0 & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots\\ 0 & 0 & 0 & \dots & 0 & 1\\ -a_0 & -a_1 & -a_2 & \dots & -a_{n-2} & -a_{n-1}\\ \end{array} \right] }_{A} \underbrace{ \left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ \vdots \\ x_{n-1} \\ x_n \end{array} \right] }_x. \]

We transformed it into a system of coupled 1st order ODEs \(\dot{x}=Ax\) and we can solve this as seen above. More importantly, the characteristic polynomial of Equation 3.2 is equal to the characteristic polynomial of Definition 3.2 and the eigenvalues are the roots of this polynomial.

3.1.4 Generalized eigenvalue problem

Let us motivate this by the example of modal analysis. If we consider the free vibrations of a undamped system we get the equation \[ M\ddot{u} + K u = 0, \quad u(0) = u_0, \] with the mass matrix \(M\) and the stiffness matrix \(K\) and \(u(t)\) being the displacement. As we know, the solution of this linear differential equation has the form \(u(t)=e^{i\omega t}u_0\) and thus we get \[ (-\omega^2 M + K) u_0 = 0. \tag{3.3}\]

Definition 3.5 (Generalized eigenvalue problem) If \(A, B \in \mathbb{C}^{n\times n}\), then the set of all matrices of the form \(A-\lambda B\) with \(\lambda\in\mathbb{C}\) is a pencil. The generalized eigenvalues of \(A-\lambda B\) are elements of the set \(\lambda(A,B)\) defined by \[ \lambda(A,B) = \{z\in\mathbb{C}: \det(A-zB)=0\}. \] If \(\lambda \in \lambda(A,B)\) and \(0\neq v\in\mathbb{C}^n\) satisfies \[ A v = \lambda B v, \tag{3.4}\] then \(v\) is an eigenvector of \(A-\lambda B\). The problem of finding a nontrivial solution to Equation 3.4 is called the generalized eigenvalue problem.

(Compare Golub and Van Loan 2013, chap. 7.7)

In our example Equation 3.3 the eigenvalues are \(\lambda = \omega^2\) and correspond to the square of the natural frequencies and the eigenvectors \(v=u\) correspond to the modes of vibration.

If \(M\) is invertible we can write \[ M^{-1}(K -\omega^2 M ) u_0 = (M^{-1}K -\omega^2 I ) u_0 = 0. \]

Warning

In most cases inverting the matrix is not recommended, to directly solve the generalized eigenvalue problem, see (Golub and Van Loan 2013, chap. 7.7) for details.

Let us walk through this with an example from (Downey and Micheli 2024, sec. 8.3).

Example 3.1 (Two Story Building)

Figure 3.1: Two story frame where the floors have different dynamic properties.

In Figure 3.1 we can see a two story building consisting of a two column frame with a floor. The first floor columns are fixed at the base and have a height \(h\), the second floor is fixed to the first and has the same height. The columns of each frame are modelled as beams with flexural rigidity \(EI\) and \(2 EI\), respectively Where \(E\) is called Young’s modulus and \(I\) the second moment of area. The mass is centred at the floor level.

This allows us to model such a building as a system with two degrees of freedom \(x_1\) and \(x_2\) as the displacement indicated in blue. The dashed blue line would be such a displacement for the frame.

The resulting equations of motion become \[ \begin{array}{r} m_1 \ddot{x_1} + k_1 x_1 + k_2 (x_2 - x_1) = 0,\\ m_2 \ddot{x_1} + k_2 (x_2 - x_1) = 0, \end{array} \] resulting in the matrices \[ M = \left[ \begin{array}{cc} m_1 & 0 \\ 0 & m_2 \end{array} \right], \quad K = \left[\begin{array}{cc} k_1 + k_2 & -k_2 \\ -k_2 & k_2 \end{array} \right]. \]

For the derivation of the stiffness coefficients we refer to (Downey and Micheli 2024, 188) and recall the result here as follows \[ k_1 = \frac{48 EI}{h^3}, \quad k_2 = \frac{24 EI}{h^3}. \] This results in the stiffness matrix \[ K = \underbrace{\frac{24 EI}{h^3}}_{=k} \left[\begin{array}{cc} 3 & -1 \\ -1 & 1 \end{array} \right]. \]

Now we can manually compute \[ \det(-\omega^2 M + K) = 0 \Leftrightarrow 2m^2\omega^4 - 5 m k \omega^2 + 2 k^2 = 0, \] and solving this equation for \(\omega^2\) results in \[ \omega_1^2 = \frac{k}{2m}, \quad \omega_2^2 = \frac{2k}{m}. \]

To compute the eigenvectors \(v_1\) and \(v_2\) we use Equation 3.1, i.e. for \(v_1\) this reads as \[ -\frac{k}{2m} \left[ \begin{array}{cc} 2m & 0 \\ 0 & m \end{array} \right] + k \left[\begin{array}{cc} 3 & -1 \\ -1 & 1 \end{array} \right] \left[\begin{array}{c} v_{11}\\ v_{21} \end{array} \right] = \left[\begin{array}{cc} 2 k & -k \\ -k & \frac{k}{2} \end{array} \right] \left[\begin{array}{c} v_{11}\\ v_{21} \end{array} \right] \overset{!}{=} \left[\begin{array}{c} 0\\ 0 \end{array} \right], \] and results in the relation \(2v_{11} = v_{21}\). We can select a solution as \[ v_1 = \left[\begin{array}{c} \frac12\\ 1 \end{array} \right]. \] For \(v_2\) we proceed similarly and derive a solution as: \[ v_2 = \left[\begin{array}{c} -1\\ 1 \end{array} \right]. \]

The eigenvectors illustrate how the displacement functions and are not just some theoretical value. The following figure visualizes the two modes.

Two wrap up the example our overall temporal response consists of the time invariant part defined by our eigenvectors \(v_1\), and \(v_2\), as well as the time dependent part with our eigenfrequencies \(\omega_1\) and \(\omega_2\) as well as the constants \(A_1\), \(A_2\), \(\phi_1\), \(\phi_2\) depending on the initial condition (see Downey and Micheli 2024, chap. 5). \[ \left[\begin{array}{c} x_1(t)\\ x_2(t) \end{array} \right] = \left[ v_1, v_2 \right] \left[\begin{array}{c} A_1 \sin(\omega_1 t + \phi_1)\\ A_2 \sin(\omega_2 t + \phi_2) \end{array} \right] \]

(Compare Downey and Micheli 2024, chap. 8.3, pp. 189-191)

3.1.5 Low-rank approximation of a square matrix

We can use the eigenvalue decomposition to approximate a square matrix.

Let us sort the eigenvalues in \(\Lambda\) and let us call \(U^{\mathsf{T}}=V^{-1}\) then we can write \[ A = V\Lambda U^{\mathsf{T}} = \sum_{i=1}^n\lambda_i v_i u_i^{\mathsf{T}} \] where \(v_i\) and \(u_i\) correspond to the rows of the matrices. Now we can define the rank \(r\) approximation of \(A\) as \[ A\approx A_r = V\Lambda U^{\mathsf{T}} = \sum_{i=1}^r\lambda_i v_i u_i^{\mathsf{T}}. \]

To make this a bit easier to understand the following illustration is helpful:

This approximation can be used to reduce the storage demand of an image.

Show the code for the figure

import matplotlib.pyplot as plt
import imageio.v3 as iio
import numpy as np
import numpy.linalg as LA
%config InlineBackend.figure_formats = ["svg"]

im = np.asarray(iio.imread(
    "https://www.mci.edu/en/download/27-logos-bilder?"
    "download=618:mci-eu-web"))

def rgb2gray(rgb):
    return np.dot(rgb[...,:3], [0.2989, 0.5870, 0.1140])

im_gray = rgb2gray(im)
im_cut = im_gray[1500:3001,1500:3001] / 255

lam_, V_ = LA.eig(im_cut)
order = np.argsort(np.abs(lam_))
lam = lam_[order[::-1]]
V = V_[:, order[::-1]]

rec = [1/1000, 10/100, 25/100, 50/100, 75/100]
Vinv = LA.inv(V)

plt.figure()
plt.plot((np.abs(lam)))
plt.yscale("log")
plt.ylabel(r"$|\lambda_i|$")
plt.xlabel("index")
plt.gca().set_aspect(7e1)

for p in rec:
    plt.figure()
    r = int(np.ceil(len(lam) * p))
    A_r = np.real(V[:, 0:r] @ np.diag(lam[0:r], 0) @ Vinv[0:r, :])
    plt.imshow(A_r, cmap=plt.get_cmap("gray"))
    plt.gca().set_axis_off()

plt.figure()
plt.imshow(im_cut, cmap=plt.get_cmap("gray"))
plt.gca().set_axis_off()
plt.show()

(a) Absolute value of Eigenvalues from largest to smallest

3.2 Summary

The eigenvalue decomposition is an important tool but it has its limitations:

the matrices involved need to be square
eigenvalues might be complex, even if the problem at hand is real
we only get a diagonal matrix \(\Lambda\) if all eigenvectors are linear independent
the computation of \(V^{-1}\) is non-trivial unless \(A\) is symmetric and \(V\) becomes unitary (\(V^{-1} = V^{\mathsf{T}}\)).

Therefore, we will look into a generalized decomposition called the singular value decomposition in the next section.