General Linear Least Squares#
The linear in linear least squares refers to how the parameters appear in the fitting function, \(Y\). So something of the form:
is still linear in the \(\{a_j\}\), even if the basis functions \(\{\varphi_j\}\) are nonlinear. For example:
is linear in the fit parameters \(\{ a_j\}\), The basis functions in this case are:
We can apply the same technique we just did for fitting to a line for this general case.
The discussion in Garcia, Numerical Methods for Physics, gives a nice overview, which we loosely follow here.
Our \(\chi^2\) is:
We can differentiate it with respect to one of the parameters, \(a_k\):
We can now rewrite this as:
We define the \(N\times M\) design matrix as
and the source as:
our system is:
which, by looking at which indices contract, gives us the linear system:
where \({\bf A}^\intercal {\bf A}\) is an \(M\times M\) matrix.
The procedure we described above is sometimes called ordinary least squares.
Parabolic Fit#
Let’s consider fitting to a function:
for \(M = 3\), this is:
If we have 10 data points, \(x_1, \ldots, x_{10}\), then our matrix \({\bf A}\) will take the form:
import numpy as np
import matplotlib.pyplot as plt
Let’s first write a function that takes an \(x_i\) and returns the entries in a row of \({\bf A}\)
def basis(x, M=3):
""" the basis function for the fit, x**n"""
j = np.arange(M)
return x**j
Now we’ll write a function that takes our data and errors and sets up the linear system \({\bf A}^\intercal {\bf A} {\bf a} = {\bf A}^\intercal {\bf b}\) and solves it.
def general_regression(x, y, yerr, M):
""" here, M is the number of fitting parameters. We will fit to
a function that is linear in the a's, using the basis functions
x**j """
N = len(x)
# construct the design matrix -- A_{ij} = Y_j(x_i)/sigma_i -- this is
# N x M. Each row corresponds to a single data point, x_i, y_i
A = np.zeros((N, M), dtype=np.float64)
for i in range(N):
A[i,:] = basis(x[i], M) / yerr[i]
# construct the MxM matrix for the linear system, A^T A:
ATA = np.transpose(A) @ A
print("condition number of A^T A:", np.linalg.cond(ATA))
# construct the RHS
b = np.transpose(A) @ (y / yerr)
# solve the system
a = np.linalg.solve(ATA, b)
# return the chisq
chisq = 0
for i in range(N):
chisq += (np.sum(a*basis(x[i], M)) - y[i])**2 / yerr[i]**2
chisq /= N-M
return a, chisq
Finally, we’ll make up some experiment data that follows a parabola, but with a perturbation.
def y_experiment2(a1, a2, a3, sigma, x):
""" return the experimental data and error in a quadratic +
random fashion, with a1, a2, a3 the coefficients of the
quadratic and sigma is scale of the error. This will be
poorly matched to a linear fit for a3 != 0 """
N = len(x)
r = sigma * np.random.randn(N)
y = a1 + a2*x + a3*x*x + r
return y, np.abs(r)
N = 40
x = np.linspace(0, 100.0, N)
# one-sigma error
sigma = 8.0
y, yerr = y_experiment2(2.0, 1.50, -0.02, sigma, x)
fig, ax = plt.subplots()
ax.errorbar(x, y, yerr=yerr, fmt="o")
<ErrorbarContainer object of 3 artists>
Now let’s do the fit of a parabola. Along the way, we’ll print out the condition number of the matrix \({\bf A}^\intercal{\bf A}\).
# do the regression with M = 3 (1, x, x^2)
M = 3
a, chisq = general_regression(x, y, yerr, M)
condition number of A^T A: 298150342.9348546
Notice that the condition number is quite large!
ax.plot(x, a[0] + a[1]*x + a[2]*x*x)
fig
Higher order fit#
fig, ax = plt.subplots()
M = 10
a, chisq = general_regression(x, y, yerr, M)
yfit = np.zeros((len(x)), dtype=x.dtype)
for i in range(N):
base = basis(x[i], M)
yfit[i] = np.sum(a*base)
ax.errorbar(x, y, yerr=yerr, fmt="o")
ax.plot(x, yfit)
condition number of A^T A: 1.0482841505702595e+35
[<matplotlib.lines.Line2D at 0x7ff7ac3d7750>]
The \(M = 10\) polynomial fits, but the condition number is large. It is not clear that going higher order here is wise.
Fitting and condition number#
It can be shown that if your basis function is orthonormal in the interval you are fitting over, then the condition number of the matrix \({\bf A}^\intercal {\bf A}\) will be much lower. For example, it we fit to the interval \([-1, 1]\), then the Legendre polynomials are a good basis.
The text by Yakowitz & Szidarovszky has a good discussion on this.
Errors in the fitting parameters#
It can be shown that the errors in the fitting parameters are:
where
is the covariance matrix.