Review of Matrices#
There are a lot of situations that we’ll see lead to needing to solve a linear system of the form:
where \({\bf A}\) is a matrix, \({\bf b}\) is a known vector, and \({\bf x}\) is the unknown vector we want to find.
We’ll start by looking at general methods, but there are lots of specialized ways to solve a linear system depending on the properties of the matrix \({\bf A}\).
Numerical linear algebra can tricky and its easy to do things wrong. Here’s a nice summary: Seven Sins of Numerical Linear Algebra
Before we look at methods for solving linear systems, we’ll start with reviewing some of the methods we learned in the past.
Matrix-vector multiplication#
Consider multiplying matrix \({\bf A}\) and vector \({\bf x}\) as \({\bf A}{\bf x} = {\bf b}\). Take:
\({\bf A}\) to be an \(m\times n\) matrix
\({\bf x}\) to be an \(n\times 1\) (column) vector
\({\bf b}\) will be an \(m\times 1\) (column) vector
The multiplication looks like:
Python has it’s own matrix multiplication operator that works with NumPy arrays, so an example of this operation would be
import numpy as np
A = np.arange(12).reshape(4, 3)
A
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
x = np.array([1, -1, 1])
A @ x
array([ 1, 4, 7, 10])
Here we see A
is a 4x3 matrix and x is a vector of 3 elements, so the result is a 4 element vector.
Instead of using @
, we can explicitly write out the multiplication ourselves to see the operations:
b = np.zeros(A.shape[0])
for i in range(A.shape[0]): # loop over rows
for j in range(A.shape[1]): # loop over columns
b[i] += A[i, j] * x[j]
b
array([ 1., 4., 7., 10.])
We see that there are 2 loops in our implementation. This means that for a square matrix of size NxN, the number of multiplications scales as \(\mathcal{O}(N^2)\)
Matrix-matrix multiplication#
Now we can consider the case of multiplying 2 matrices \({\bf C} = {\bf A}{\bf B}\). Now we essentially do a dot product of each row in \(A\) with each column of \(B\). This looks like:
Again, we can use the python @
operator:
B = np.array([[1, -1, 2, 3, 0],
[2, -2, 1, 5, -7],
[-1, 3, 4, -8, 3]])
A @ B
array([[ 0, 4, 9, -11, -1],
[ 6, 4, 30, -11, -13],
[ 12, 4, 51, -11, -25],
[ 18, 4, 72, -11, -37]])
Now we see that A
is a 4x3 matrix and B
is a 3x5 matrix, so the result is a 4x5 matrix.
Again, we can explicitly write this ourselves to see the details:
assert A.shape[1] == B.shape[0]
C = np.zeros((A.shape[0], B.shape[1]))
for i in range(C.shape[0]): # loop over rows of C
for j in range(C.shape[1]): # loop over columns of C
for k in range(A.shape[1]): # filling element C[i,j] via inner product
C[i, j] += A[i, k] * B[k, j]
C
array([[ 0., 4., 9., -11., -1.],
[ 6., 4., 30., -11., -13.],
[ 12., 4., 51., -11., -25.],
[ 18., 4., 72., -11., -37.]])
Now we see that there are 3 loops, which means that for square matrices, matrix multiplication scales like \(\mathcal{O}(N^3)\).
We note however, that if the matrix has some symmetries, it is possible to reduce the cost of the multiplication. Making use of repeated work can also reduce the cost in the general case (see https://en.wikipedia.org/wiki/Matrix_multiplication#Computational_complexity)
Determinant#
A determinant operates on a square matrix and returns a scalar that characterizes the matrix. For our purposes, the most important property of a determinant is that a linear system, \({\bf A}{\bf x} = {\bf b}\) is solvable only if the determinant of \({\bf A}\) is nonzero.
Some common ways to note the determinant operation are:
\(|{\bf A}|\)
\(\mathrm{det}({\bf A})\)
Computing the determinant for small matrices is straightforward:
where the 3x3 case shown above is an example of Laplace expansion.
While this can be extended to larger matrices, it becomes computationally expensive, and we will see a more natural way of getting the determinant as part of solving the linear system \({\bf A}{\bf x} = {\bf b}\).
Inverse#
For a matrix \({\bf A}\), the inverse, \({\bf A}^{-1}\) is defined such that
From this definition, we might thing that the way to solve the linear system \({\bf A}{\bf x} = {\bf b}\) is to first compute the inverse of \({\bf A}\) and then do:
However, computing the inverse of a matrix is very computationally expensive, and we’ll see that there are easier ways to directly solve the linear system.