Nonlinear Fitting

Nonlinear Fitting#

What about the case of fitting to a function where the fit parameters enter in a nonlinear fashion? For example:

\[f(x; a_0, a_1) = a_0 e^{a_1 x}\]

One trick that is often used for something like this is to transform the data. So instead of fitting the data \((x_i, y_i)\), you instead fit \((x_i, \log y_i)\), and then our fitting function is:

\[\log f(x; a_0, a_1) = a_1 x + \log a_0\]

which is linear.

However, when there are errors associated with the \(y_i\), the errors do not necessarily transform the correct way when you take the logarithm.

So let’s look at how we would fit directly to a nonlinear function.

We’ll minimize the same fitting function:

\[\chi^2 = \sum_{i=1}^N \frac{(y_i - f(x; {\bf a}))^2}{\sigma_i^2}\]

with fitting parameters \({\bf a} = (a_1, \ldots, a_M)^\intercal\).

Now we take the derivatives with respect to each parameter, \(a_k\):

\[\frac{\partial \chi^2}{\partial a_k} = -2 \sum_{i=1}^N \frac{(y_i - f(x, {\bf a}))}{\sigma_i^2} \frac{\partial f}{\partial a_k} = 0\]

Let’s define \(g_k \equiv {\partial \chi^2}/{\partial a_k}\), then we have

\[\begin{split}{\bf g}({\bf a}) = \left ( \begin{array}{c} g_1({\bf a}) \\ g_2({\bf a}) \\ \vdots \\ g_M({\bf a}) \end{array} \right ) = 0\end{split}\]

This is a nonlinear system of \(M\) equations and \(M\) unknowns. We can solve this using the same multivariate Newton’s method we looked at before:

  • Take an initial guess at the fit parameters, \({\bf a}^{(k)}\)

  • Solve the system \({\bf J}\delta {\bf a} = -{\bf g}\), where \(J_{ij} = \partial g_i/\partial a_j\) is the Jacobian

  • Correct the initial guess, \({\bf a}^{(k+1)} = {\bf a}^{(k)} + \delta {\bf a}\)

As we’ve seen with Newton’s method, convergence will be very sensitive to the initial guess.

Fitting an exponential#

Let’s try this out on data that is constructed to follow an exponential trend.

First let’s construct the data, and perturb it with some errors. We’ll take the form:

\[y = a_0 e^{a_1 x}\]
import numpy as np
import matplotlib.pyplot as plt
# make up some experimental data
a0 = 2.5
a1 = 2./3.
sigma = 4.0

N = 25

x = np.linspace(0.0, 4.0, N)
r = sigma * np.random.randn(N)
y = a0 * np.exp(a1 * x) + r
yerr = np.abs(r)

fig, ax = plt.subplots()
ax.errorbar(x, y, yerr=yerr, fmt="o")
<ErrorbarContainer object of 3 artists>
../_images/fec7f7d8bd114a0b93adc6ba6962cd014f7c19cf4bd0c56ca2be4275bd726e2b.png

Now, let’s compute our vector \({\bf g}\) that we will zero:

\[\begin{align*} g_0 &= \frac{\partial \chi^2}{\partial a_0} = -2 \sum_{i=1}^N \frac{(y_i - a_0 e^{a_1 x_i})}{\sigma_i^2} (e^{a_1 x_i}) \\ g_1 &= \frac{\partial \chi^2}{\partial a_1} = -2 \sum_{i=1}^N \frac{(y_i - a_0 e^{a_1 x_i})}{\sigma_i^2} (x_i a_0 e^{a_1 x_i}) \end{align*}\]

We can divide out the \(-2\) in each expression. We’ll keep the overall \(a_0\) in the expression, to deal with the case where it might be \(0\).

Let’s write a function to compute this:

def g(x, y, yerr, a):
    """compute the nonlinear functions we minimize.  Here a is the vector
    of fit parameters"""
    
    a0, a1 = a
    
    g0 = np.sum(np.exp(a1 * x) * (y - a0 * np.exp(a1 * x)) / yerr**2)
    g1 = a0 * np.sum(x * np.exp(a1 * x) * (y - a0 * np.exp(a1 * x)) / yerr**2)
    
    return np.array([g0, g1])

We also need the Jacobian. We could either compute this numerically, via differencing, or analytically. We’ll do the latter.

\[\begin{align*} \frac{\partial g_0}{\partial a_0} &= -\sum_{i=1}^N \frac{e^{2a_1 x_i}}{\sigma_i^2} \\ \frac{\partial g_0}{\partial a_1} &= \sum_{i=1}^N \frac{x_i e^{a_1 x_i} (y_i - 2 a_0 e^{a_1 x_i})}{\sigma_i^2} \\ \frac{\partial g_1}{\partial a_0} &= \sum_{i=1}^N \frac{x_i e^{a_1 x_i} (y_i - 2 a_0 e^{a_1 x_i})}{\sigma_i^2} \\ \frac{\partial g_1}{\partial a_1} &= \sum_{i=1}^N \frac{a_0 x_i^2 e^{a_1 x_i} (y_i - 2 a_0 e^{a_1 x_i})}{\sigma_i^2} \end{align*}\]

Notice that the Jacobian is symmetric:

\[\begin{split}{\bf J} = \left ( \begin{array}{cc} \frac{\partial g_0}{\partial a_0} & \frac{\partial g_0}{\partial a_1} \\ \frac{\partial g_1}{\partial a_0} & \frac{\partial g_1}{\partial a_1} \end{array} \right ) = \left ( \begin{array}{cc} \frac{\partial^2 \chi^2}{\partial a_0^2} & \frac{\partial^2 \chi^2}{\partial a_0 \partial a_1} \\ \frac{\partial^2 \chi^2}{\partial a_1 \partial a_0} & \frac{\partial^2 \chi^2}{\partial a_1^2} \end{array} \right ) \end{split}\]

This is called the Hessian matrix.

Let’s write this function:

def jac(x, y, yerr, a):
    """ compute the Jacobian of the function g"""

    a0, a1 = a
    
    dg0da0 = -np.sum(np.exp(2.0 * a1 * x) / yerr**2)
    dg0da1 = np.sum(x * np.exp(a1 * x) * (y - 2.0 * a0 * np.exp(a1 * x)) / yerr**2)
    dg1da0 = dg0da1
    dg1da1 = np.sum(a0 * x**2 * np.exp(a1 * x) * (y - 2.0 * a0 * np.exp(a1 * x)) / yerr**2)
    
    return np.array([[dg0da0, dg0da1],
                     [dg1da0, dg1da1]])
def fit(aguess, x, y, yerr, tol=1.e-5):
    """ aguess is the initial guess to our fit parameters.  x and y
        are the vector of points that we are fitting to, and yerr are
        the errors in y"""
    
    avec = aguess.copy()

    err = 1.e100
    while err > tol:

        # get the jacobian
        J = jac(x, y, yerr, avec)

        print("condition number of J: ", np.linalg.cond(J))

        # get the current function values
        gv = g(x, y, yerr, avec)

        # solve for the correction: J da = -g
        da = np.linalg.solve(J, -gv)

        avec += da
        err = np.max(np.abs(da))

    return avec
# initial guesses
aguess = np.array([2.0, 1.0])

# fit
afit = fit(aguess, x, y, yerr)
condition number of J:  164.34894326340628
condition number of J:  195.8904428147066
condition number of J:  278.34758511331097
condition number of J:  588.6058904532853
condition number of J:  2990.7065863291064
condition number of J:  94689.64776018137
condition number of J:  264630.5426388268
condition number of J:  41338.57459605621
condition number of J:  872320.2449859331
condition number of J:  11316.992974414909
condition number of J:  4145808.6694455543
condition number of J:  5722.500125595029
condition number of J:  489219.3007288989
condition number of J:  12842.71126172183
condition number of J:  1441413.620123245
condition number of J:  1435.0814562560633
condition number of J:  13973.949952495685
condition number of J:  5631650.33188915
condition number of J:  5865.35035524176
condition number of J:  147551.5176839204
condition number of J:  446780.9448127108
condition number of J:  43907.45204431583
condition number of J:  1563362.661919562
condition number of J:  3216.8297883747614
condition number of J:  80575.64551782915
condition number of J:  1318799.8163501557
condition number of J:  337279.88667540636
condition number of J:  1316931.2501210344
condition number of J:  1315171.2855140376
afit
array([2.47017046, 0.66956341])
ax.plot(x, afit[0] * np.exp(afit[1] *x))
fig
../_images/03d527cb30dfd0d687be9efd12a178c3a2c72dac046f47c460056c2d89634722.png

Is it a minimum?#

We just found an extrema. Let’s plot the surface around our fit parameters to see if it looks like a minimum

npts = 100
a0v = np.linspace(0.5 * afit[0], 2.0 * afit[0], npts)
a1v = np.linspace(0.5 * afit[1], 2.0 * afit[1], npts)
def chisq(a0, a1, x, y, yerr):
    return np.sum((y - a0 * np.exp(a1 * x))**2 / yerr**2)
c2 = np.zeros((npts, npts), dtype=np.float64)
for i, a0 in enumerate(a0v):
    for j, a1 in enumerate(a1v):
        c2[i, j] = chisq(a0, a1, x, y, yerr)
c2.max()
3675724975.1126876

Now we’ll plot the (log of) the \(\chi^2\)

fig, ax = plt.subplots()

# we need to transpose to put a0 on the horizontal
# we use origin = lower to have the origin at the lower left
im = ax.imshow(np.log10(c2).T,
               origin="lower",
               extent=[a0v[0], a0v[-1], a1v[0], a1v[-1]])
fig.colorbar(im, ax=ax, orientation="horizontal")
ax.scatter([afit[0]], [afit[1]], color="r", marker="x")
ax.set_xlabel("$a_0$")
ax.set_ylabel("$a_1$")
Text(0, 0.5, '$a_1$')
../_images/bbb6b7db1959e9768ed4c685e77415697b263419e1b8c234d0d5260ec0534a15.png

It looks like there is a very broad minimum there.

Troubles#

Consider if we tried to add another parameter, fitting to:

\[y = a_0 e^{a_1 x + a_2}\]

here \(a_2\) enters the same way as \(a_0\), which would give a singular matrix, and make our solution unstable.