General Linear Least Squares

General Linear Least Squares#

The linear in linear least squares refers to how the parameters appear in the fitting function, \(Y\). So something of the form:

\[Y(x; \{a_j\}) = \sum_{j=1}^M a_j \varphi_j(x)\]

is still linear in the \(\{a_j\}\), even if the basis functions \(\{\varphi_j\}\) are nonlinear.

Example

The fitting function:

\[Y(x; \{a_j\}) = a_1 + a_2 x + a_3 x^2\]

is linear in the fit parameters \(\{ a_j\}\), The basis functions in this case are:

\[\begin{align*} \varphi_1 &= 1 \\ \varphi_2 &= x \\ \varphi_3 &= x^2 \end{align*}\]

We can apply the same technique we just did for fitting to a line for this general case.

Reference

The discussion in Garcia, Numerical Methods for Physics, gives a nice overview, which we loosely follow here.

Our \(\chi^2\) is:

\[\begin{align*} \chi^2(\{a_j\}) &= \sum_{i=1}^N \frac{(Y(x_i; \{a_j\}) - y_i)^2}{\sigma_i^2} \\ &= \sum_{i=1}^N \frac{1}{\sigma_i^2} \left [ \left (\sum_{j=1}^M a_j \varphi_j(x_i)\right ) - y_i \right ]^2 \end{align*}\]

We can differentiate it with respect to one of the parameters, \(a_k\):

\[\begin{align*} \frac{\partial \chi^2}{\partial a_k} &= \frac{\partial}{\partial a_k} \sum_{i=1}^N \frac{1}{\sigma_i^2} \left [\left (\sum_{j=1}^M a_j \varphi_j(x_i)\right ) - y_i \right ]^2 \\ &= \sum_{i=1}^N \frac{1}{\sigma_i^2} \frac{\partial}{\partial a_k} \left [\left (\sum_{j=1}^M a_j \varphi_j(x_i)\right ) - y_i \right ]^2 \\ &= 2 \sum_{i=1}^N \frac{1}{\sigma_i^2} \left [\left (\sum_{j=1}^M a_j \varphi_j(x_i)\right ) - y_i \right ] \varphi_k(x_i) = 0 \end{align*}\]

or rewriting:

\[\sum_{i=1}^N \sum_{j=1}^M a_j \frac{\varphi_j(x_i) \varphi_k(x_i)}{\sigma_i^2} = \sum_{i=1}^N \frac{y_i \varphi_k(x_i)}{\sigma_i^2}\]

We define the \(N\times M\) design matrix as

\[A_{ij} = \frac{\varphi_j(x_i)}{\sigma_i}\]

and the source as:

\[b_i = \frac{y_i}{\sigma_i}\]

our system is:

\[\sum_{i=1}^N \sum_{j=1}^M A_{ik} A_{ij} a_j = \sum_{i=1}^N A_{ik} b_i\]

which, by looking at which indices contract, gives us the linear system:

\[{\bf A}^\intercal {\bf A} {\bf a} = {\bf A}^\intercal {\bf b}\]

where \({\bf A}^\intercal {\bf A}\) is an \(M\times M\) matrix.

The procedure we described above is sometimes called ordinary least squares.

Linear fit revisited#

For a linear fit,

\[Y(x) = a_1 + a_2 x\]

and our basis functions are: \(\phi_1 = 1\) and \(\phi_2 = x\).

Design matrix and source#

Our design matrix in this case is:

\[\begin{split}{\bf A} = \left ( \begin{array}{cc} 1/\sigma_1 & x_1 / \sigma_1 \\ 1/\sigma_2 & x_2 / \sigma_2 \\ \vdots & \vdots \\ 1/\sigma_N & x_N / \sigma_N \\ \end{array}\right )\end{split}\]

and the source is:

\[\begin{split}{\bf b} = \left (\begin{array}{c} y_1 / \sigma_1 \\ y_2 / \sigma_2 \\ \vdots \\ y_N / \sigma_N \end{array} \right )\end{split}\]

Linear system#

We can now use this design matrix and source to find the underlying linear system.

\({\bf A}^\intercal {\bf A}\) is:

\[\begin{align*} {\bf A}^\intercal{\bf A} &= \left ( \begin{array}{cccc} 1/\sigma_1 & 1/\sigma_2 & \cdots & 1/\sigma_N \\ x_1/\sigma_1 & x_2/\sigma_2 & \cdots & x_N/\sigma_N \end{array} \right ) \left ( \begin{array}{cc} 1/\sigma_1 & x_1 / \sigma_1 \\ 1/\sigma_2 & x_2 / \sigma_2 \\ \vdots & \vdots \\ 1/\sigma_N & x_N / \sigma_N \\ \end{array}\right ) \\ &= \left ( \begin{array}{cc} \sum_i 1/\sigma_i^2 & \sum_i x_i / \sigma_i^2 \\ \sum_i x_i/\sigma_i^2 & \sum_i x_i^2 / \sigma_i^2 \end{array} \right ) \end{align*}\]

and \({\bf A}^\intercal {\bf A} {\bf a}\) is:

\[\begin{align*} {\bf A}^\intercal {\bf A} {\bf a} &= \left ( \begin{array}{cc} \sum_i 1/\sigma_i^2 & \sum_i x_i / \sigma_i^2 \\ \sum_i x_i/\sigma_i^2 & \sum_i x_i^2 / \sigma_i^2 \end{array} \right ) \left ( \begin{array}{c} a_1 \\ a_2 \end{array} \right ) \\ &= \left ( \begin{array}{c} a_1 \sum_i 1/\sigma_i^2 + a_2 \sum_i x_i/\sigma_i^2 \\ a_1 \sum_i x_i/\sigma_i^2 + a_2 \sum_i x_i^2 /\sigma_i^2 \end{array} \right ) \end{align*}\]

\({\bf A}^\intercal {\bf b}\) is:

\[\begin{align*} {\bf A}^\intercal {\bf b} &= \left ( \begin{array}{cccc} 1/\sigma_1 & 1/\sigma_2 & \cdots & 1/\sigma_N \\ x_1/\sigma_1 & x_2/\sigma_2 & \cdots & x_N/\sigma_N \end{array} \right ) \left ( \begin{array}{c} y_1 / \sigma_1 \\ y_2 / \sigma_2 \\ \vdots \\ y_N / \sigma_N \\ \end{array}\right ) \\ &= \left ( \begin{array}{c} \sum_i y_i / \sigma_i^2 \\ \sum_i x_i y_i / \sigma_i^2 \end{array} \right ) \end{align*}\]

Putting these together, this gives us 2 equations with 2 unknowns (\(a_1\), \(a_2\)):

\[\begin{align*} a_1 \sum_i \frac{1}{\sigma_i^2} + a_2 \sum_i \frac{x_i}{\sigma_i^2} &= \sum_i \frac{y_i}{\sigma_i^2} \\ a_1 \sum_i \frac{x_i}{\sigma_i^2} + a_2 \sum_i \frac{x_i^2}{\sigma_i^2} &= \sum_i \frac{x_i y_i}{\sigma_i^2} \end{align*}\]

This is precisely the system we saw before.