Testing Our Least Squares#

We’ll write a general linear least squares implementation and test it first on a parabolic fit and then on a higher-order polynomial.

Parabolic fit#

Let’s consider fitting to a function:

\[Y(x; \{a_j\}) = \sum_{j=1}^M a_j x^{j-1}\]

for \(M = 3\), this is:

\[Y(x) = a_1 + a_2 x + a_3 x^2\]

If we have 10 data points, \(x_1, \ldots, x_{10}\), then our matrix \({\bf A}\) will take the form:

\[\begin{split}{\bf A} = \left (\begin{array}{ccc} 1/\sigma_1 & x_1/\sigma_1 & x_1^2/\sigma1 \\ 1/\sigma_2 & x_2/\sigma_2 & x_2^2/\sigma_2 \\ \vdots & \vdots & \vdots \\ 1/\sigma_{10} & x_{10}/\sigma_{10} & x_{10}^2/\sigma_{10} \end{array} \right )\end{split}\]
import numpy as np
import matplotlib.pyplot as plt

Let’s first write a function that takes an \(x_i\) and returns the entries in a row of \({\bf A}\)

def basis(x, M=3):
    """ the basis function for the fit, x**n"""
    
    j = np.arange(M)
    return x**j

Now we’ll write a function that takes our data and errors and sets up the linear system \({\bf A}^\intercal {\bf A} {\bf a} = {\bf A}^\intercal {\bf b}\) and solves it.

def general_regression(x, y, yerr, M):
    """ here, M is the number of fitting parameters.  We will fit to
        a function that is linear in the a's, using the basis functions
        x**j """

    N = len(x)

    # construct the design matrix -- A_{ij} = Y_j(x_i)/sigma_i -- this is
    # N x M.  Each row corresponds to a single data point, x_i, y_i
    A = np.zeros((N, M), dtype=np.float64)

    for i in range(N):
        A[i,:] = basis(x[i], M) / yerr[i]

    # construct the MxM matrix for the linear system, A^T A:
    ATA = np.transpose(A) @ A

    print("condition number of A^T A:", np.linalg.cond(ATA))

    # construct the RHS
    b = np.transpose(A) @ (y / yerr)

    # solve the system
    a = np.linalg.solve(ATA, b)

    # return the chisq
    chisq = 0
    for i in range(N):
        chisq += (np.sum(a*basis(x[i], M)) - y[i])**2 / yerr[i]**2

    chisq /= N-M

    return a, chisq

Finally, we’ll make up some experiment data that follows a parabola, but with a perturbation.

def y_experiment2(a1, a2, a3, sigma, x):
    """ return the experimental data and error in a quadratic +
    random fashion, with a1, a2, a3 the coefficients of the
    quadratic and sigma is scale of the error.  This will be
    poorly matched to a linear fit for a3 != 0 """

    N = len(x)

    # this creates a Gaussian normal random number from a distribution
    # centered on 0 with a width of sigma
    rng = np.random.default_rng()
    r = sigma * rng.standard_normal(N)
    
    y = a1 + a2*x + a3*x*x + r

    return y, np.ones_like(y) * sigma
N = 40
x = np.linspace(0, 100.0, N)

# one-sigma error
sigma = 8.0

y, yerr = y_experiment2(2.0, 1.50, -0.02, sigma, x)
fig, ax = plt.subplots()

ax.errorbar(x, y, yerr=yerr, fmt="o")
<ErrorbarContainer object of 3 artists>
../_images/d7569720812547f8e86a8cd6c35e00cd40c286812cede73fe7d461b6166c6a9b.png

Now let’s do the fit of a parabola. Along the way, we’ll print out the condition number of the matrix \({\bf A}^\intercal{\bf A}\).

# do the regression with M = 3 (1, x, x^2)
M = 3
a, chisq = general_regression(x, y, yerr, M)
condition number of A^T A: 169717921.7386773

Notice that the condition number is quite large!

ax.plot(x, a[0] + a[1]*x + a[2]*x*x)
fig
../_images/66ce396ad6810a3e6a2af2f8271804e1d56e8da2249f7c29883c829fb068618d.png

Higher order fit#

fig, ax = plt.subplots()
M = 10
a, chisq = general_regression(x, y, yerr, M)
yfit = np.zeros((len(x)), dtype=x.dtype)
for i in range(N):
    base = basis(x[i], M)
    yfit[i] = np.sum(a*base)

ax.errorbar(x, y, yerr=yerr, fmt="o")
ax.plot(x, yfit)
condition number of A^T A: 1.8806097126766699e+34
[<matplotlib.lines.Line2D at 0x7f35f0850910>]
../_images/dd30e94cd87e89991d8e9aa8587a61fe22094be9c1f6a0a79643f7151421294c.png

The \(M = 10\) polynomial fits, but the condition number is large. It is not clear that going higher order here is wise.

Fitting and condition number#

It can be shown that if your basis function is orthonormal in the interval you are fitting over, then the condition number of the matrix \({\bf A}^\intercal {\bf A}\) will be much lower. For example, it we fit to the interval \([-1, 1]\), then the Legendre polynomials are a good basis.

The text by Yakowitz & Szidarovszky has a good discussion on this.

Errors in the fitting parameters#

It can also be shown that the errors in the fitting parameters are:

\[\sigma_{a_j} = \sqrt{C_{jj}}\]

where

\[{\bf C} = ({\bf A}^\intercal {\bf A})^{-1}\]

is the covariance matrix.

C++ implementation#

For the C++ implementation, we create a Matrix class, contained in matrix.H, that overloads * to do matrix-matrix and matrix-vector multiplication.

We also reuse our Gaussian elimination code from before.

  • matrix.H : the `Matrix class

    #ifndef MATRIX_H
    #define MATRIX_H
    
    #include <vector>
    #include <iostream>
    #include <cassert>
    
    // a 2-d matrix with contiguous storage
    // here the data is stored in row-major order in a 1-d memory space
    // managed as a vector.  We overload () to allow us to index this as
    // a(irow, icol)
    
    struct Matrix {
    
        std::size_t _rows;
        std::size_t _cols;
        std::vector<double> _data;
    
        Matrix (std::size_t rows, std::size_t cols, double val=0.0)
            : _rows{rows},
              _cols{cols},
              _data(rows * cols, val)
        {
            assert (rows > 0 && cols > 0);
        }
    
        // Constructor for list initialization
        // e.g., Matrix A{{1, 2}, {3, 4}};
    
        Matrix(std::initializer_list<std::initializer_list<double>> values)
            : _rows(values.size()),
              _cols(values.begin()->size()),
              _data()
        {
            assert(_rows > 0);
            for (const auto& row : values) {
                // Ensure all rows have the same number of columns
                assert(row.size() == _cols);
                _data.insert(_data.end(), row.begin(), row.end());
            }
        }
    
        // note the "const" after the argument list here -- this means
        // that this can be called on a const Matrix
    
        inline std::size_t ncols() const { return _cols;}
        inline std::size_t nrows() const { return _rows;}
    
        inline double& operator()(std::size_t row, std::size_t col) {
            // size_t is unsigned, so will always be >= 0
            assert (row < _rows);
            assert (col < _cols);
            return _data[row*_cols + col];
        }
    
        inline const double& operator()(std::size_t row, std::size_t col) const {
            // size_t is unsigned, so will always be >= 0
            assert (row < _rows);
            assert (col < _cols);
            return _data[row*_cols + col];
        }
    
        Matrix transpose() {
            Matrix A_T(_cols, _rows);
            for (std::size_t irow = 0; irow < _rows; ++irow) {
                for (std::size_t jcol = 0; jcol < _cols; ++jcol) {
                    A_T(jcol, irow) = (*this)(irow, jcol);
                }
            }
            return A_T;
        }
    
        std::vector<double>
        operator* (const std::vector<double>& x) {
            // do b = A x
            assert (_cols == x.size());
    
            std::vector<double> b(_rows, 0.0);
    
            for (std::size_t irow = 0; irow < _rows; ++irow) {
                for (std::size_t k = 0; k < _cols; ++k) {
                    b[irow] += (*this)(irow, k) * x[k];
                }
            }
    
            return b;
        }
    
        Matrix
        operator* (const Matrix& B) {
            // do C = A B
            assert (_cols == B.nrows());
            Matrix C(_rows, B.ncols());
    
            for (std::size_t irow = 0; irow < C.nrows(); ++irow) {
                for (std::size_t jcol = 0; jcol < C.ncols(); ++jcol) {
                    for (std::size_t k = 0; k < _cols; ++k) {
                        C(irow, jcol) += (*this)(irow, k) * B(k, jcol);
                    }
                }
            }
    
            return C;
        }
    
    };
    
    // the << operator is not part of the of the class, so it is not a
    // member
    
    inline
    std::ostream& operator<< (std::ostream& os, const Matrix& a) {
    
        for (std::size_t row = 0; row < a.nrows(); ++row) {
            for (std::size_t col = 0; col < a.ncols(); ++col) {
                os << a(row, col) << " ";
            }
            os << std::endl;
        }
        return os;
    }
    
    #endif
    
  • gauss.H : the Gaussian elimination function

    #ifndef GAUSS_H
    #define GAUSS_H
    
    #include <algorithm>
    #include <vector>
    #include <limits>
    #include <iostream>
    #include <iomanip>
    
    #include "matrix.H"
    
    inline
    std::vector<double> gauss_elim(Matrix& A, std::vector<double>& b) {
    
        // perform gaussian elimination with pivoting, solving A x = b.
        //
        //  A is an NxN matrix, x and b are an N-element vectors.  Note: A and b are
        //  changed upon exit to be in upper triangular (row echelon) form
    
        auto N = b.size();
    
        // A is square, with each dimension of length N
    
        assert(A.nrows() == N && A.ncols() == N);
    
        // allocate the solution array
    
        std::vector<double> x(N, 0.0);
    
        // main loop over rows
    
        for (std::size_t krow = 0; krow < N; ++krow) {
    
            // find the pivot row based on the size of column k -- only
            // consider the rows beyond the current row
    
            auto row_max{krow};
            double col_max = std::numeric_limits<double>::lowest();
            for (std::size_t kk = krow; kk < N; ++kk) {
                if (std::abs(A(kk, krow)) > col_max) {
                    col_max = std::abs(A(kk, krow));
                    row_max = kk;
                }
            }
    
            // swap the row with the largest element in the current column
            // with the current row (pivot) -- do this with b too!
    
            if (row_max != krow) {
    
                for (std::size_t jcol = 0; jcol < N; ++jcol) {
                    std::swap(A(krow, jcol), A(row_max, jcol));
                }
                std::swap(b[krow], b[row_max]);
            }
    
            // do the forward-elimination for all rows below the current
    
            for (std::size_t irow = krow+1; irow < N; ++irow) {
                double coeff = A(irow, krow) / A(krow, krow);
    
                for (std::size_t jcol = krow+1; jcol < N; ++jcol) {
                    A(irow, jcol) += -A(krow, jcol) * coeff;
                }
    
                A(irow, krow) = 0.0;
                b[irow] += -coeff * b[krow];
            }
        }
    
        // back-substitution
    
        x[N-1] = b[N-1] / A(N-1, N-1);
    
        // size_t is unsigned, so it is never < 0, it will
        // wrap to a large number
        for (std::size_t irow = N-2; irow != std::numeric_limits<std::size_t>::max(); --irow) {
            std::cout << irow << std::endl;
            double bsum = b[irow];
            for (std::size_t jcol = irow+1; jcol < N; ++jcol) {
                bsum += -A(irow, jcol) * x[jcol];
            }
            x[irow] = bsum / A(irow, irow);
        }
    
        return x;
    }
    
    #endif
    
  • fitting.H : the generalized least squares fitting function

    #include <cassert>
    #include <cmath>
    #include <iostream>
    #include <vector>
    
    #include "matrix.H"
    #include "gauss.H"
    
    std::pair<std::vector<double>, double>
    general_regression(const std::vector<double>& x,
                       const std::vector<double>& y,
                       const std::vector<double>& yerr,
                       int M) {
    
        auto N = x.size();
        assert (y.size() == N && yerr.size() == N);
    
        // make the design matrix
        // each row corresponds to a single data point
        // each column corresponds to a particular basis function
    
        Matrix A(N, M);
        for (int n = 0; n < N; ++n) {
            A(n, 0) = 1.0 / yerr[n];
            for (int m = 1; m < M; ++m) {
                // x**m / yerr
                A(n, m) = A(n, m-1) * x[n];
            }
        }
    
        // build the linear system
        auto ATA = A.transpose() * A;
    
        std::vector<double> source(N, 0);
        for (int n = 0; n < N; ++n) {
            source[n] = y[n] / yerr[n];
        }
    
        auto b = A.transpose() * source;
    
        auto a = gauss_elim(ATA, b);
    
        double chisq{};
        for (int n = 0; n < N; ++n) {
            double y_model{};
            for (int m = 0; m < M; ++m) {
                y_model += a[m] * std::pow(x[n], m);
            }
            chisq += std::pow(y_model - y[n], 2) / std::pow(yerr[n], 2);
        }
        chisq /= static_cast<double>(N - M);
    
        return {a, chisq};
    }
    
  • test_fitting.cpp

    #include <iostream>
    #include <fstream>
    #include <random>
    #include <vector>
    
    #include "fitting.H"
    
    
    std::pair<std::vector<double>, std::vector<double>>
    y_experiment(const std::vector<double>& av, double sigma,
                 const std::vector<double>& x) {
    
        auto N = x.size();
    
        std::vector<double> y(N, 0.0);
    
        // compute Gaussian random numbers that will represent
        // the error for each point
        std::mt19937 generator(12345);
        std::normal_distribution<double> randn(0.0, sigma);
    
        std::vector<double> yerr(N, 0.0);
    
        for (std::size_t n = 0; n < N; ++n) {
            auto r = randn(generator);
            y[n] = av[0] + av[1] * x[n] + av[2] * x[n] * x[n] + r;
            yerr[n] = sigma;
        }
    
        return {y, yerr};
    }
    
    int main() {
    
        constexpr std::size_t N{40};
        constexpr double xmin{0.0};
        constexpr double xmax{100.0};
    
        std::vector<double> x(N, 0.0);
        double dx = (xmax - xmin) / static_cast<double>(N - 1);
    
        for (std::size_t n = 0; n < N; ++n) {
            x[n] = xmin + static_cast<double>(n) * dx;
        }
    
        double sigma{8.0};
        auto [y, yerr] = y_experiment({2.0, 1.50, -0.02}, sigma, x);
    
        std::ofstream of("experiment.dat");
        for (std::size_t n = 0; n < N; ++n) {
            of << x[n] << " " << y[n] << " " << yerr[n] << std::endl;
        }
    
        constexpr int M{3};
        auto [a, chisq] = general_regression(x, y, yerr, M);
    
        std::cout << "chisq = " << chisq << std::endl;
    
        std::cout << "a = ";
        for (auto e : a) {
            std::cout << e << " ";
        }
        std::cout << std::endl;
    
    }