1 Linear Algebra

The aim of this section is to discuss the basics of matrix, vector and number theory that we need for the later chapters and not introduce the whole of linear algebra. Nevertheless, we will rely on some basic definitions that can be found in any linear algebra book. For notation we refer to Golub and Van Loan (2013).

In Python the module numpy is used to represent vectors and matrices, we can include it like this (and give it its short hand np):

import numpy as np

1.1 Notation

We will refer to \[ v \in \mathbb{R}^{n} \quad \Leftrightarrow \quad v = \left[ \begin{array}{c} v_1 \\ v_2 \\ v_3 \\ \vdots \\ v_n \end{array} \right], \quad v_i \in \mathbb{R}, \] as a vector \(v\) with \(n\) elements. The set \((\mathbb{R}^n, + ,\cdot)\) forms a so called vector space with the vector addition \(+\) and the scalar multiplication \(\cdot\).

Definition 1.1 (Vector space) For a set \(V\) over a field \(F\) with the vectors \(u, v, w \in V\) and the scalars \(\alpha, \beta \in F\) the following properties need to hold true to form a vector space.

For the vector addition we need to have

associativity \[ u + (v + w) = (u + v) +w,\]
commutativity \[u + v = v + u,\]
there needs to exists an identity element \(0\in \mathbb{R}^n\), i.e. the zero vector such that \[v + 0 = v,\]
there needs to exist an inverse element \[v + w = 0\quad \Rightarrow w\equiv -v,\] and this element is usually denoted by \(-v\).

For the scalar multiplication we need to have

associativity \[\alpha(\beta v) = (\alpha\beta)v,\]
distributivity with respect to the vector addition \[\alpha(u + v) = \alpha u + \alpha v,\]
distributivity of the scalar addition \[(\alpha + \beta)v = \alpha v + \beta v,\]
and there needs to exist a multiplicative identity element \(1\in\mathbb{R}\) \[1 v = v.\]

v = np.array([1, 2, 3, 4])
# show the shape
print(f"{v.shape=}")
# access a single element
print(f"{v[0]=}")
# use slicing to access multiple elements
print(f"{v[0:3]=}")
print(f"{v[2:]=}")
print(f"{v[:2]=}")
print(f"{v[0::2]=}")

alpha = 0.5
w = alpha * v
print(f"{w=}")

v.shape=(4,)
v[0]=np.int64(1)
v[0:3]=array([1, 2, 3])
v[2:]=array([3, 4])
v[:2]=array([1, 2])
v[0::2]=array([1, 3])
w=array([0.5, 1. , 1.5, 2. ])

Important

While in math we start indices with 1, Python starts with 0.

Exercise 1.1 (Vector space in Python) Create some vectors and scalars with np.array and check the above statements with + and *.

From vectors we can move to matrices, where \[ A \in \mathbb{R}^{m\times n} \quad \Leftrightarrow \quad A = (a_{ij}) = \left[ \begin{array}{cccc} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \dots & a_{mn} \\ \end{array} \right],\quad a_{ij} \in \mathbb{R}, \] is called a \(m \times n\) (\(m\) times \(n\)) matrix. If its values are real numbers we say it is an element of \(\mathbb{R}^{m\times n}\).

Important

We will use capital letters for matrices and small letters for vectors.

A = np.array([[1, 2, 3, 4], 
              [5, 6, 7, 8],
              [9, 10, 11, 12]])
# show the shape
print(f"{A.shape=}")
# access a single element
print(f"{A[0, 0]=}")
# use slicing to access multiple elements
print(f"{A[0, :]=}")
print(f"{A[:, 2]=}")

A.shape=(3, 4)
A[0, 0]=np.int64(1)
A[0, :]=array([1, 2, 3, 4])
A[:, 2]=array([ 3,  7, 11])

Consequently we can say that a vector is a \(n \times 1\) matrix. It is sometimes also referred to as column vector and its counterpart a \(1 \times n\) matrix as a row vector.

Exercise 1.2 (Matrix as vector space?) How do we need to define \(+\) and \(\cdot\) to say that \((\mathbb{R}^{m \times n}, + ,\cdot)\) is forming a vector space?

Does np.array, +, * fulfil the properties of a vector space?

If we want to refer to a row or a column of a matrix \(A\) we will use the following short hands:

\(A_{i-}\) for row \(i\),
\(A_{-j}\) for column \(j\).

We can multiply a matrix with a vector, as long as the dimensions fit. Note that usually there is no \(\cdot\) used to indicate multiplication: \[ Av = \left[ \begin{array}{cccc} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & A_{2n} \\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \dots & A_{mn} \\ \end{array} \right] \left[ \begin{array}{c} v_1 \\ v_2 \\ \vdots \\ v_n \end{array} \right] = A_{-1} v_1 + A_{-2} v_2 + \dots + A_{-n} v_n. \] The result is a vector but this time in \(\mathbb{R}^m\).

In Python the * operator is usually indicating multiplication. Unfortunately, in numpy it is interpreted as element wise multiplication, so we use @ for multiplications between vector spaces.

w = A @ v
# show the shape
print(f"{w.shape=}")
# show the result
print(f"{w=}")
# Doing the same by hand this is tricky
w_tilde = np.zeros(A.shape[0])
for i, bb in enumerate(v):
    w_tilde += A[:, i] * bb
print(f"{w_tilde=}")

w.shape=(3,)
w=array([ 30,  70, 110])
w_tilde=array([ 30.,  70., 110.])

As we can see from the above equation, we can view the matrix \(A\) as a linear mapping or linear function between two vector spaces, namely from \(\mathbb{R}^{n}\) to \(\mathbb{R}^{m}\).

Definition 1.2 (Linear map) A linear map between vector spaces are mappings or functions that preserve the structure of the vector space. For two vector spaces \(V\) and \(W\) over a field \(F\) the mapping \[T: V \to W\] is called linear if

for \(v, w \in V\) \[T(v + w) = T(v) + T(w),\]
for \(v \in V\) and \(\alpha \in F\) \[T(\alpha v) = \alpha T(v).\]

A linear mapping of special interest to us is the transpose of a matrix defined by turning rows into columns and vice versa: \[ C = A^{\mathsf{T}}, \quad \Rightarrow \quad c_{ij} = a_{ji}. \] Consequently, the transpose of a (row) vector is a column vector.

print(f"{A=}")
print(f"{A.shape=}")
B = A.transpose()
print(f"{B=}")
print(f"{B.shape=}")

A=array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])
A.shape=(3, 4)
B=array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])
B.shape=(4, 3)

With this operation we can define two more mappings.

Definition 1.3 (Dot product) The dot product, inner product, or scalar product of two vectors \(v\) and \(w\) as is defined by \[\langle v, w\rangle = v \cdot w = v^{\mathsf{T}} w = \sum_i v_i w_i.\]

v = np.array([1, 2, 3, 4])
w = np.array([1, 1, 1, 1])
# alternatively we can define w with
w = np.ones(v.shape)
alpha = np.vdot(v, w)
print(f"{alpha=}")

alpha=np.float64(10.0)

Note

As \(\mathbb{R}^n\) is an euclidean vector space the above function is also called the inner product.

Definition 1.4 (Outer product) We also have the outer product defined as: \[ v w^{\mathsf{T}} = \left[ \begin{array}{cccc} v_1 w_1 & v_1 w_2 & \dots & v_1 w_n \\ v_2 w_1 & v_2 w_2 & \dots &v_2 w_n \\ \vdots & \vdots & \ddots & \vdots\\ v_n w_1 & v_n w_2 & \dots & v_n w_n \\ \end{array} \right] \]

C = np.outer(v, w)
print(f"{C=}")

C=array([[1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.]])

We can also multiply matrices \(A\) and \(B\) by applying the matrix vector multiplication to each column vector of \(B\), or a bit more elaborated:

For a \({m \times p}\) matrix \(A\) and a \({p \times n}\) matrix \(B\) the matrix-matrix multiplication (\(\mathbb{R}^{m\times p} \times \mathbb{R}^{p\times n} \to \mathbb{R}^{m\times n}\)) \[C=AB \quad \Rightarrow\quad c_{ij} = \sum_{k=1}^p a_{ik}b_{kj}\] forms a \({m \times n}\) matrix.

Exercise 1.3 (Matrix multiplication?) Show that the matrix multiplication is:

associative
(left and right) distributive
but not commutative

C = A @ A.transpose()
print(f"{C=}")
D = A.transpose() @ A
print(f"{D=}")

C=array([[ 30,  70, 110],
       [ 70, 174, 278],
       [110, 278, 446]])
D=array([[107, 122, 137, 152],
       [122, 140, 158, 176],
       [137, 158, 179, 200],
       [152, 176, 200, 224]])

From the above Python snippet we can easily see that matrix-matrix multiplication is not commutative.

1.2 Norms

Definition 1.5 (Norm) A norm is a mapping from a vector space \(V\) to the field \(F\) into the real numbers

\[\| \cdot \|: V \to \mathbb{R}_0^+, v \mapsto \|v\|\] if it fulfils for \(v, w\in V\) and \(\alpha \in F\) the following

positivity \[ \|v\| \geq 0, \quad \|v\| = 0 \Leftrightarrow v = 0 \in V, \]
absolute homogeneity \[ \| \alpha v \| = |\alpha| \| v \|, \]
subadditivity (often called the triangular inequality) \[ \| v + w\| \leq \| v \| + \| w \|.\]

There are multiple norms that can be useful for vectors. The most common are:

the one norm \[ \| v \|_1 = \sum_i |v_i|,\]
the two norm (euclidean norm) \[ \| w \| = \| v \|_2 = \sqrt{\sum_i |x_i|^2} = \sqrt{\langle v, v \rangle},\]
more general the \(p\)-norms (for \(1\leq p \le \infty\)) \[ \| v \|_p = \left(\sum_i |v_i|^p\right)^{\frac{1}{p}},\]
the \(\infty\) norm \[ \| v \|_\infty = \max_i |v_i|.\]

And for matrices:

the one norm (column sum norm) \[ \| A \|_1 = \max_j \sum_i |a_{ij}|,\]
the Frobeniusnorm \[ \| A \| = \| A \|_F = \sqrt{\sum_i \sum_j |a_{ij}|^2},\]
the \(p\) norms are defined \[ \| A \|_p = \left(\sum_i \sum_j |a_{ij}|^p\right)^{\frac1p},\]
the \(\infty\) norm (row sum norm) \[ \| A \|_1 = \max_i \sum_j |a_{ij}|.\]

# The norms can be found in the linalg package of numpy
from numpy import linalg as LA
norm_v = LA.norm(v)
print(f"{norm_v=}")
norm_v2 = LA.norm(v, 2)
print(f"{norm_v2=}")
norm_A = LA.norm(A, 1)
print(f"{norm_A=}")
norm_Afr = LA.norm(A, "fro")
print(f"{norm_Afr=}")

norm_v=np.float64(5.477225575051661)
norm_v2=np.float64(5.477225575051661)
norm_A=np.float64(24.0)
norm_Afr=np.float64(25.495097567963924)

Note

The function norm from the numpy.linalg package can be used to compute other norms or properties as well, see docs.

1.3 Basis of vector spaces

As we will be using the notion of basis vector or a basis of a vector space we should introduce them properly.

Definition 1.6 (Basis) A set of vectors \(\mathcal{B} = \{b_1, \ldots, b_r\}, b_i \in \mathbb{R}^n\):

is called linear independent if \[ \sum_{j=1}^r \alpha_j b_j = 0 \Rightarrow \alpha_1 = \alpha_2 = \cdots = \alpha_r = 0,\]
spans \(\mathbb{R}^n\) if \[ v = \sum_{j=1}^r \alpha_j b_j, \quad \forall v \in \mathbb{R}^n, \alpha_1, \ldots, \alpha_r \in \mathbb{R}.\]

The set \(\mathcal{B}\) is called a basis of a vector space if it is linear independent and spans the entire vector space. The size of the basis, i.e. the number of vectors in the basis, is called the dimension of the vector space.

For a shorter notation we often associate the matrix \[ B = \left[b_1 | \cdots | b_n\right] \] with the basis.

The standard basis of \(\mathbb{R}^n\) are the vectors \(e_i\) that are zero everywhere except for index \(i\) and its associated matrix is \[ I_n = \left[ \begin{array}{cccc} 1 & 0 & \dots & 0\\ 0 & 1 & \ddots & \vdots \\ \vdots & \ddots & 1 & 0\\ 0 & \dots & 0 & 1 \\ \end{array} \right] \in \mathbb{R}^{n \times n}, \] and called the identity matrix. Note, the index \(n\) is often omitted as it should be clear from the dimensions of the matrix.

The easiest way to create one of standard basis vectors, lets say \(e_3 \in \mathbb{R}^3\), in Python is by calling

# We need to keep the index shift in mind
n = 3
e_3 = np.zeros(n)
e_3[3-1] = 1
print(f"{e_3=}")

e_3=array([0., 0., 1.])

and the identity matrix by

n = 4
I_4 = np.eye(n)
print(f"{I_4=}")

I_4=array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

Example 1.1 (Standard basis)

n = 3
e_1 = np.zeros(n)
e_1[0] = 1
e_2 = np.zeros(n)
e_2[1] = 1
e_3 = np.zeros(n) 
e_3[2] = 1

x = np.random.rand(n)
print(f"{x=}")
# compute the coefficients
1a = np.dot(x, e_1) / np.dot(e_1, e_1)
b = np.dot(x, e_2) / np.dot(e_2, e_2)
c = np.dot(x, e_3) / np.dot(e_3, e_3)
y = a * e_1 + b * e_2 + c * e_3
print(f"{y=}")
print(f"{np.allclose(x, y)=}")
print(f"{LA.norm(x-y)=}")

1: This is the orthogonal projection of \(x\) on \(e_1\).

x=array([0.08279337, 0.80412245, 0.26339274])
y=array([0.08279337, 0.80412245, 0.26339274])
np.allclose(x, y)=True
LA.norm(x-y)=np.float64(0.0)

See numpy.testing for more ways of testing in numpy.

1.4 The inverse of a matrix

Definition 1.7 (Matrix inverse) For matrices \(A, X\in \mathbb{R}^{n\times n}\) that satisfy \[ A X = X A = I_n \] we call \(X\) the inverse of \(A\) and denote it by \(A^{-1}\).

The following holds true for the inverse of matrices:

the inverse of a product is the product of the inverses \[ (AB)^{-1} = B^{-1}A^{-1},\]
the inverse of the transpose is the transpose of the inverse \[ (A^{-1})^{\mathsf{T}} = (A^{\mathsf{T}})^{-1} \equiv A^{-\mathsf{T}}.\]

A = np.random.rand(3, 3)
print(f"{A=}")
X = LA.inv(A)
print(f"{X=}")
print(f"{A @ X=}")
print(f"{np.allclose(A @ X, np.eye(A.shape[0]))=}")
# Note that the equality is hard to achieve for floats
np.testing.assert_equal(A @ X, np.eye(A.shape[0]), verbose=False)

A=array([[0.14428232, 0.3535935 , 0.59134082],
       [0.59725813, 0.84366491, 0.00543072],
       [0.3461897 , 0.54736427, 0.27627259]])
X=array([[-59.43477   , -58.37106377, 128.3630867 ],
       [ 42.13376962,  42.5803167 , -91.02118882],
       [ -9.00130829, -11.21892982,  23.10677243]])
A @ X=array([[ 1.00000000e+00,  7.18868957e-16, -6.80126911e-15],
       [ 2.20779142e-15,  1.00000000e+00, -9.87048987e-15],
       [ 5.22083444e-16, -2.58630720e-15,  1.00000000e+00]])
np.allclose(A @ X, np.eye(A.shape[0]))=True

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[13], line 8
      6 print(f"{np.allclose(A @ X, np.eye(A.shape[0]))=}")
      7 # Note that the equality is hard to achieve for floats
----> 8 np.testing.assert_equal(A @ X, np.eye(A.shape[0]), verbose=False)

    [... skipping hidden 1 frame]

File ~/work/MECH-M-DUAL-1-DBM/MECH-M-DUAL-1-DBM/.venv/lib/python3.12/site-packages/numpy/_utils/__init__.py:85, in _rename_parameter.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
     83             raise TypeError(msg)
     84         kwargs[new_name] = kwargs.pop(old_name)
---> 85 return fun(*args, **kwargs)

    [... skipping hidden 1 frame]

File /opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/contextlib.py:81, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
     78 @wraps(func)
     79 def inner(*args, **kwds):
     80     with self._recreate_cm():
---> 81         return func(*args, **kwds)

File ~/work/MECH-M-DUAL-1-DBM/MECH-M-DUAL-1-DBM/.venv/lib/python3.12/site-packages/numpy/testing/_private/utils.py:889, in assert_array_compare(comparison, x, y, err_msg, verbose, header, precision, equal_nan, equal_inf, strict, names)
    884         err_msg += '\n' + '\n'.join(remarks)
    885         msg = build_err_msg([ox, oy], err_msg,
    886                             verbose=verbose, header=header,
    887                             names=names,
    888                             precision=precision)
--> 889         raise AssertionError(msg)
    890 except ValueError:
    891     import traceback

AssertionError: 
Arrays are not equal

Mismatched elements: 9 / 9 (100%)
Max absolute difference among violations: 9.87048987e-15
Max relative difference among violations: 4.6629367e-15

Definition 1.8 (Change of basis) If we associate the matrices \(B\) and \(C\) with the matrix consisting of the basis vectors of two bases \(\mathcal{B}\) and \(\mathcal{C}\) of a vector space we can define the transformation matrix \(T_{\mathcal{C}}^{\mathcal{B}}\) from \(\mathcal{B}\) to \(\mathcal{C}\) as \[ T_{\mathcal{C}}^{\mathcal{B}} = C^{-1}B. \]

So if we have a vector \(b\) represented in \(\mathcal{B}\) we can compute its representation in \(\hat{b}\) in \(\mathcal{C}\) as \[ \hat{b} = T_{\mathcal{C}}^{\mathcal{B}} b = C^{-1}B b. \]

A special form is if we have the standard basis \(I\) and move to a basis \(C\) we get \[ \hat{b} = T_{C}^{I} b = C^{-1} b. \]

Example 1.2 (Basis Change) For \[ B = \left[ \begin{array}{ccc} 1 & 3 & 2 \\ 0 & 1 & 1 \\ 2 & 0 & 1 \\ \end{array} \right]\quad \text{and}\quad C = \left[ \begin{array}{ccc} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 1 & 0 \\ \end{array} \right] \] we get \[ T_{C}^{B} = \left[ \begin{array}{ccc} \frac32 & 1 & 1 \\ \frac12 & -1 & 0 \\ -\frac12 & 2 & 1 \\ \end{array} \right], \] and for a \(v = 2 b_1 - b_2 + 3 b_3\) we can compute its representation in \(C\) as \[ \hat{v} = T_{C}^{B} v = \left[ \begin{array}{ccc} \frac32 & 1 & 1 \\ \frac12 & -1 & 0 \\ -\frac12 & 2 & 1 \\ \end{array} \right] \left[ \begin{array}{c} 2 \\ -1 \\ 3 \end{array} \right] = \left[ \begin{array}{c} 5 \\ 2 \\ 0 \end{array} \right], \] and therefore, \(v = 5c_1 + 2c_2 + 0 c_3\).

(Compare Wikipedia)

There are special basis vectors, respectively matrices that allow for easy computation of the inverse.

Definition 1.9 (Orthonormal vector) We call a set of vectors \(\mathcal{V}=\{u_1, u_2, \ldots, u_m\}\) orthonormal if and only if \[ \forall i,j: \langle u_i, u_j \rangle = \delta_{ij} \] where \(\delta_{ij}\) is called the Kronecker delta which is \(1\) if and only if \(i=j\) and \(0\) otherwise. This is true for a inner product, see Definition 1.3.

Extending this to a matrix (and to that end a basis) as follows.

Definition 1.10 (Orthogonal matrix) We call a matrix \(Q\in\mathbb{R}^{n\times n}\), here the real and square is important, orthogonal if its columns and rows are orthonormal vectors. This is the same as \[ Q^{\mathsf{T}} Q = Q Q^{\mathsf{T}} = I \] and this implies that \(Q^{-1} = Q^{\mathsf{T}}\).

For now, this concludes our introduction to linear algebra. We will come back to more in later sections.