General Linear Least Squares#

The linear in linear least squares refers to how the parameters appear in the fitting function, \(Y\). So something of the form:

\[Y(x; \{a_j\}) = \sum_{j=1}^M a_j \varphi_j(x)\]

is still linear in the \(\{a_j\}\), even if the basis functions \(\{\varphi_j\}\) are nonlinear. For example:

\[Y(x; \{a_j\}) = a_1 + a_2 x + a_3 x^2\]

is linear in the fit parameters \(\{ a_j\}\), The basis functions in this case are:

\[\begin{align*} \varphi_1 &= 1 \\ \varphi_2 &= x \\ \varphi_3 &= x^2 \end{align*}\]

We can apply the same technique we just did for fitting to a line for this general case.

The discussion in Garcia, Numerical Methods for Physics, gives a nice overview, which we loosely follow here.

Our \(\chi^2\) is:

\[\chi^2(\{a_j\}) = \sum_{i=1}^N \frac{(Y(x_i; \{a_j\}) - y_i)^2}{\sigma_i^2} = \sum_{i=1}^N \frac{1}{\sigma_i^2} \left [\left (\sum_{j=1}^M a_j \varphi_j(x_i)\right ) - y_i \right ]^2\]

We can differentiate it with respect to one of the parameters, \(a_k\):

\[\begin{align*} \frac{\partial \chi^2}{\partial a_k} &= \frac{\partial}{\partial a_k} \sum_{i=1}^N \frac{1}{\sigma_i^2} \left [\left (\sum_{j=1}^M a_j \varphi_j(x_i)\right ) - y_i \right ]^2 \\ &= \sum_{i=1}^N \frac{1}{\sigma_i^2} \frac{\partial}{\partial a_k} \left [\left (\sum_{j=1}^M a_j \varphi_j(x_i)\right ) - y_i \right ]^2 \\ &= 2 \sum_{i=1}^N \frac{1}{\sigma_i^2} \left [\left (\sum_{j=1}^M a_j \varphi_j(x_i)\right ) - y_i \right ] \varphi_k(x_i) = 0 \end{align*}\]

We can now rewrite this as:

\[\sum_{i=1}^N \sum_{j=1}^M a_j \frac{\varphi_j(x_i) \varphi_k(x_i)}{\sigma_i^2} = \sum_{i=1}^N \frac{y_i \varphi_k(x_i)}{\sigma_i^2}\]

We define the \(N\times M\) design matrix as

\[A_{ij} = \frac{\varphi_j(x_i)}{\sigma_i}\]

and the source as:

\[b_i = \frac{y_i}{\sigma_i}\]

our system is:

\[\sum_{i=1}^N \sum_{j=1}^M A_{ik} A_{ij} a_j = \sum_{i=1}^N A_{ik} b_i\]

which, by looking at which indices contract, gives us the linear system:

\[{\bf A}^\intercal {\bf A} {\bf a} = {\bf A}^\intercal {\bf b}\]

where \({\bf A}^\intercal {\bf A}\) is an \(M\times M\) matrix.

The procedure we described above is sometimes called ordinary least squares.

Parabolic Fit#

Let’s consider fitting to a function:

\[Y(x; \{a_j\}) = \sum_{j=1}^M a_j x^{j-1}\]

for \(M = 3\), this is:

\[Y(x) = a_1 + a_2 x + a_3 x^2\]

If we have 10 data points, \(x_1, \ldots, x_{10}\), then our matrix \({\bf A}\) will take the form:

\[\begin{split}{\bf A} = \left (\begin{array}{ccc} 1/\sigma_1 & x_1/\sigma_1 & x_1^2/\sigma1 \\ 1/\sigma_2 & x_2/\sigma_2 & x_2^2/\sigma_2 \\ \vdots & \vdots & \vdots \\ 1/\sigma_{10} & x_{10}/\sigma_{10} & x_{10}^2/\sigma_{10} \end{array} \right )\end{split}\]
import numpy as np
import matplotlib.pyplot as plt

Let’s first write a function that takes an \(x_i\) and returns the entries in a row of \({\bf A}\)

def basis(x, M=3):
    """ the basis function for the fit, x**n"""
    
    j = np.arange(M)
    return x**j

Now we’ll write a function that takes our data and errors and sets up the linear system \({\bf A}^\intercal {\bf A} {\bf a} = {\bf A}^\intercal {\bf b}\) and solves it.

def general_regression(x, y, yerr, M):
    """ here, M is the number of fitting parameters.  We will fit to
        a function that is linear in the a's, using the basis functions
        x**j """

    N = len(x)

    # construct the design matrix -- A_{ij} = Y_j(x_i)/sigma_i -- this is
    # N x M.  Each row corresponds to a single data point, x_i, y_i
    A = np.zeros((N, M), dtype=np.float64)

    for i in range(N):
        A[i,:] = basis(x[i], M) / yerr[i]

    # construct the MxM matrix for the linear system, A^T A:
    ATA = np.transpose(A) @ A

    print("condition number of A^T A:", np.linalg.cond(ATA))

    # construct the RHS
    b = np.transpose(A) @ (y / yerr)

    # solve the system
    a = np.linalg.solve(ATA, b)

    # return the chisq
    chisq = 0
    for i in range(N):
        chisq += (np.sum(a*basis(x[i], M)) - y[i])**2 / yerr[i]**2

    chisq /= N-M

    return a, chisq

Finally, we’ll make up some experiment data that follows a parabola, but with a perturbation.

def y_experiment2(a1, a2, a3, sigma, x):
    """ return the experimental data and error in a quadratic +
    random fashion, with a1, a2, a3 the coefficients of the
    quadratic and sigma is scale of the error.  This will be
    poorly matched to a linear fit for a3 != 0 """

    N = len(x)
    r = sigma * np.random.randn(N)
    
    y = a1 + a2*x + a3*x*x + r

    return y, np.abs(r)
N = 40
x = np.linspace(0, 100.0, N)

# one-sigma error
sigma = 8.0

y, yerr = y_experiment2(2.0, 1.50, -0.02, sigma, x)
fig, ax = plt.subplots()

ax.errorbar(x, y, yerr=yerr, fmt="o")
<ErrorbarContainer object of 3 artists>
../_images/16818318ad217480a9650eddb7493eac042e8635e879f6f9e50d4c13227abd7f.png

Now let’s do the fit of a parabola. Along the way, we’ll print out the condition number of the matrix \({\bf A}^\intercal{\bf A}\).

# do the regression with M = 3 (1, x, x^2)
M = 3
a, chisq = general_regression(x, y, yerr, M)
condition number of A^T A: 68785489.166329

Notice that the condition number is quite large!

ax.plot(x, a[0] + a[1]*x + a[2]*x*x)
fig
../_images/417045f457cca728cb35cd8078279b02cd50df2992d7924910f6418b566bcf3c.png

Higher order fit#

fig, ax = plt.subplots()
M = 10
a, chisq = general_regression(x, y, yerr, M)
yfit = np.zeros((len(x)), dtype=x.dtype)
for i in range(N):
    base = basis(x[i], M)
    yfit[i] = np.sum(a*base)

ax.errorbar(x, y, yerr=yerr, fmt="o")
ax.plot(x, yfit)
condition number of A^T A: 4.941874983114367e+34
[<matplotlib.lines.Line2D at 0x7fa5e8629fc0>]
../_images/f1be17c2c02d0bd7582103fbf859dea64ef2c991f1bc5352354631ecbbec9aad.png

The \(M = 10\) polynomial fits, but the condition number is large. It is not clear that going higher order here is wise.

Fitting and condition number#

It can be shown that if your basis function is orthonormal in the interval you are fitting over, then the condition number of the matrix \({\bf A}^\intercal {\bf A}\) will be much lower. For example, it we fit to the interval \([-1, 1]\), then the Legendre polynomials are a good basis.

The text by Yakowitz & Szidarovszky has a good discussion on this.

Errors in the fitting parameters#

It can be shown that the errors in the fitting parameters are:

\[\sigma_{a_j} = \sqrt{C_{jj}}\]

where

\[{\bf C} = ({\bf A}^\intercal {\bf A})^{-1}\]

is the covariance matrix.