Nonlinear Fitting

Nonlinear Fitting#

What about the case of fitting to a function where the fit parameters enter in a nonlinear fashion? For example:

\[f(x; a_0, a_1) = a_0 e^{a_1 x}\]

Note

One trick that is often used for something like this is to transform the data. So instead of fitting the data \((x_i, y_i)\), you instead fit \((x_i, \log y_i)\), and then our fitting function is:

\[\log f(x; a_0, a_1) = a_1 x + \log a_0\]

which is linear.

However, when there are errors associated with the \(y_i\), the errors do not necessarily transform the correct way when you take the logarithm.

Let’s look at how we would fit directly to a nonlinear function.

We’ll minimize the same fitting function:

\[\chi^2 = \sum_{i=1}^N \frac{(y_i - f(x; {\bf a}))^2}{\sigma_i^2}\]

with fitting parameters \({\bf a} = (a_1, \ldots, a_M)^\intercal\).

Now we take the derivatives with respect to each parameter, \(a_k\):

\[\frac{\partial \chi^2}{\partial a_k} = -2 \sum_{i=1}^N \frac{(y_i - f(x, {\bf a}))}{\sigma_i^2} \frac{\partial f}{\partial a_k} = 0\]

Let’s define \(g_k \equiv {\partial \chi^2}/{\partial a_k}\), then we have

\[\begin{split}{\bf g}({\bf a}) = \left ( \begin{array}{c} g_1({\bf a}) \\ g_2({\bf a}) \\ \vdots \\ g_M({\bf a}) \end{array} \right ) = 0\end{split}\]

This is a nonlinear system of \(M\) equations and \(M\) unknowns. We can solve this using the same multivariate Newton’s method we looked at before:

  • Take an initial guess at the fit parameters, \({\bf a}^{(k)}\)

  • Solve the system \({\bf J}\delta {\bf a} = -{\bf g}\), where \(J_{ij} = \partial g_i/\partial a_j\) is the Jacobian

  • Correct the initial guess, \({\bf a}^{(k+1)} = {\bf a}^{(k)} + \delta {\bf a}\)

As we’ve seen with Newton’s method, convergence will be very sensitive to the initial guess.

Fitting an exponential#

Let’s try this out on data that is constructed to follow an exponential trend.

First let’s construct the data, and perturb it with some errors. We’ll take the form:

\[y = a_0 e^{a_1 x}\]
import numpy as np
import matplotlib.pyplot as plt
# make up some experimental data
a0 = 2.5
a1 = 2./3.
sigma = 4.0

N = 25

x = np.linspace(0.0, 4.0, N)
r = sigma * np.random.randn(N)
y = a0 * np.exp(a1 * x) + r
yerr = np.abs(r)

fig, ax = plt.subplots()
ax.errorbar(x, y, yerr=yerr, fmt="o")
<ErrorbarContainer object of 3 artists>
../_images/7c72526670f3cc76455cd87585a168bce256205b4fc1773544e1c2076ed113b6.png

Now, let’s compute our vector \({\bf g}\) that we will zero:

\[\begin{align*} g_0 &= \frac{\partial \chi^2}{\partial a_0} = -2 \sum_{i=1}^N \frac{(y_i - a_0 e^{a_1 x_i})}{\sigma_i^2} (e^{a_1 x_i}) \\ g_1 &= \frac{\partial \chi^2}{\partial a_1} = -2 \sum_{i=1}^N \frac{(y_i - a_0 e^{a_1 x_i})}{\sigma_i^2} (x_i a_0 e^{a_1 x_i}) \end{align*}\]

We can divide out the \(-2\) in each expression. We’ll keep the overall \(a_0\) in the expression, to deal with the case where it might be \(0\).

Let’s write a function to compute this:

def g(x, y, yerr, a):
    """compute the nonlinear functions we minimize.  Here a is the vector
    of fit parameters"""
    
    a0, a1 = a
    
    g0 = np.sum(np.exp(a1 * x) * (y - a0 * np.exp(a1 * x)) / yerr**2)
    g1 = a0 * np.sum(x * np.exp(a1 * x) * (y - a0 * np.exp(a1 * x)) / yerr**2)
    
    return np.array([g0, g1])

We also need the Jacobian. We could either compute this numerically, via differencing, or analytically. We’ll do the latter.

\[\begin{align*} \frac{\partial g_0}{\partial a_0} &= -\sum_{i=1}^N \frac{e^{2a_1 x_i}}{\sigma_i^2} \\ \frac{\partial g_0}{\partial a_1} &= \sum_{i=1}^N \frac{x_i e^{a_1 x_i} (y_i - 2 a_0 e^{a_1 x_i})}{\sigma_i^2} \\ \frac{\partial g_1}{\partial a_0} &= \sum_{i=1}^N \frac{x_i e^{a_1 x_i} (y_i - 2 a_0 e^{a_1 x_i})}{\sigma_i^2} \\ \frac{\partial g_1}{\partial a_1} &= \sum_{i=1}^N \frac{a_0 x_i^2 e^{a_1 x_i} (y_i - 2 a_0 e^{a_1 x_i})}{\sigma_i^2} \end{align*}\]

Notice that the Jacobian is symmetric:

\[\begin{split}{\bf J} = \left ( \begin{array}{cc} \frac{\partial g_0}{\partial a_0} & \frac{\partial g_0}{\partial a_1} \\ \frac{\partial g_1}{\partial a_0} & \frac{\partial g_1}{\partial a_1} \end{array} \right ) = \left ( \begin{array}{cc} \frac{\partial^2 \chi^2}{\partial a_0^2} & \frac{\partial^2 \chi^2}{\partial a_0 \partial a_1} \\ \frac{\partial^2 \chi^2}{\partial a_1 \partial a_0} & \frac{\partial^2 \chi^2}{\partial a_1^2} \end{array} \right ) \end{split}\]

This is called the Hessian matrix.

Let’s write this function:

def jac(x, y, yerr, a):
    """ compute the Jacobian of the function g"""

    a0, a1 = a
    
    dg0da0 = -np.sum(np.exp(2.0 * a1 * x) / yerr**2)
    dg0da1 = np.sum(x * np.exp(a1 * x) * (y - 2.0 * a0 * np.exp(a1 * x)) / yerr**2)
    dg1da0 = dg0da1
    dg1da1 = np.sum(a0 * x**2 * np.exp(a1 * x) * (y - 2.0 * a0 * np.exp(a1 * x)) / yerr**2)
    
    return np.array([[dg0da0, dg0da1],
                     [dg1da0, dg1da1]])

Tip

Convergence here is very sensitive to the initial guess. We add a cap on the number of iterations allowed prevent this from being an infinite loop if we can’t converge.

def fit(aguess, x, y, yerr, tol=1.e-5):
    """ aguess is the initial guess to our fit parameters.  x and y
        are the vector of points that we are fitting to, and yerr are
        the errors in y"""
    
    avec = aguess.copy()

    max_steps = 100
    err = 1.e100
    n = 0
    while err > tol and n < max_steps:

        # get the jacobian
        J = jac(x, y, yerr, avec)

        print("condition number of J: ", np.linalg.cond(J))

        # get the current function values
        gv = g(x, y, yerr, avec)

        # solve for the correction: J da = -g
        da = np.linalg.solve(J, -gv)

        avec += da
        err = np.max(np.abs(da/avec))
        n += 1
    return avec
# initial guesses
aguess = np.array([2.0, 1.0])

# fit
afit = fit(aguess, x, y, yerr)
condition number of J:  143.4769580593344
condition number of J:  179.93192950974228
condition number of J:  288.3625230820207
condition number of J:  880.9698567403798
condition number of J:  4215.931194826469
condition number of J:  845.0049214488857
condition number of J:  7270.3893580389185
condition number of J:  2499.422271606722
condition number of J:  3422.799711927889
condition number of J:  3620.993682590586
condition number of J:  3642.582695280144
afit
array([2.50159814, 0.66403563])
ax.plot(x, afit[0] * np.exp(afit[1] *x))
fig
../_images/fd5379a4c890aae13cfb7df1665e9ad3827614d6e54f9609468e1c6a78f595f0.png

Is it a minimum?#

We just found an extrema. Let’s plot the surface around our fit parameters to see if it looks like a minimum

npts = 100
a0v = np.linspace(0.5 * afit[0], 2.0 * afit[0], npts)
a1v = np.linspace(0.5 * afit[1], 2.0 * afit[1], npts)
def chisq(a0, a1, x, y, yerr):
    return np.sum((y - a0 * np.exp(a1 * x))**2 / yerr**2)
chisq(afit[0], afit[1], x, y, yerr)
np.float64(22.607966055067706)
c2 = np.zeros((npts, npts), dtype=np.float64)
for i, a0 in enumerate(a0v):
    for j, a1 in enumerate(a1v):
        c2[i, j] = chisq(a0, a1, x, y, yerr)

Now we’ll plot the (log of) the \(\chi^2\)

fig, ax = plt.subplots()

# we need to transpose to put a0 on the horizontal
# we use origin = lower to have the origin at the lower left
im = ax.imshow(np.log10(c2).T,
               origin="lower",
               extent=[a0v[0], a0v[-1], a1v[0], a1v[-1]])
fig.colorbar(im, ax=ax, orientation="horizontal")
ax.scatter([afit[0]], [afit[1]], color="r", marker="x")
ax.set_xlabel("$a_0$")
ax.set_ylabel("$a_1$")
Text(0, 0.5, '$a_1$')
../_images/7bde4e62397f0ae8107ec02e06c56a08c2356ab3e1d56a65fae8d0c4c4f8acfe.png

It looks like there is a very broad minimum there.

Troubles#

Consider if we tried to add another parameter, fitting to:

\[y = a_0 e^{a_1 x + a_2}\]

here \(a_2\) enters the same way as \(a_0\), which would give a singular matrix, and make our solution unstable.