A First Example

A First Example#

Let’s do a simple example: given a sequence of numbers as input, we want the last number in the sequence to be the output. Let’s see if we can train a network to recognize this.

import numpy as np
import random

We’ll create a ModelData class that generates our data and also has a method interpret_result() that allows us to pass in the result of the network and tell us which of the number it predicts. This is needed since the network will not know that we are dealing with integers, so we’ll need to round it.

The important members are:

ModelData.x : the input
ModelData.y : the output

Anticipating the need to scale our data later, we’ll also create a ModelData.x_scaled which can provide a scaling to the input and ModelData.y_scaled which will be the scaled output.

class ModelData:
    """this is the model data for our "last number" training set.  We
    produce input of length N consisting of the numbers 0-9, and we set
    the output to be simply the last element of the input vector

    """
    def __init__(self, N=10):
        self.N = N

        # our model input data
        rng = np.random.default_rng()
        self.x = rng.integers(0, high=10, size=N)
        self.x_scaled = self.x
        
        # our scaled model output data
        self.y = np.array([self.x[-1]])
        self.y_scaled = np.array([self.x_scaled[-1]])
        
    def interpret_result(self, out):
        """take the network output and return a the number from the allowed
        sequence we are closest to

        """
        return max(0, min(9, int(np.round(out, decimals=0)[0])))

Here’s what our data looks like:

model = ModelData()
model.x

array([8, 6, 4, 6, 4, 9, 7, 6, 0, 6])

model.y

array([6])

Now we write our network. We’ll make it take the name of the class that will create the data so we can reuse this network with different variations.

Some implementation details:

The network will only deal with model.x_scaled and model.y_scaled.
We need to initialize the matrix \({\bf A}\)—we’ll use Gaussian random numbers, centered on 0 with a width set to \(\sqrt{N_\mathrm{in}}\). This seems to be a common choice.
We’ll loop over the data in the training set in a random order (randomizing each epoch).

class NeuralNetwork:

    def __init__(self, num_training_unique=100, data_class=None):
        
        self.num_training_unique = num_training_unique

        self.train_set = []
        for _ in range(self.num_training_unique):
            self.train_set.append(data_class())

        # initialize our matrix with Gaussian normal random numbers
        # we get the size from the length of the input and output
        model = self.train_set[0]
        self.N_out = len(model.y_scaled)
        self.N_in = len(model.x_scaled)

        rng = np.random.default_rng()
        self.A = rng.normal(0.0, 1.0/np.sqrt(self.N_in),
                           (self.N_out, self.N_in))

    def g(self, xi):
        """our sigmoid function"""
        return 1.0/(1.0 + np.exp(-xi))

    def train(self, n_epochs=10, eta=0.2):
        """Do the minimization for the training"""

        # train
        for _ in range(n_epochs):
            random.shuffle(self.train_set)
            for model in self.train_set:

                # gradient descent -- just a single improvement.  eta
                # here is our learning rate

                # make these column vectors
                x = model.x_scaled.reshape(self.N_in, 1)
                y = model.y_scaled.reshape(self.N_out, 1)

                b = self.A @ x
                z = self.g(b)

                self.A[:,:] += -eta * 2 * (z - y) * z * (1 - z) @ x.T

    def predict(self, model):
        """predict the outcome using our trained matrix A """
        z = self.g(self.A @ model.x_scaled)
        return model.interpret_result(z)
    
    def check_accuracy(self):
        """use the trained network on the training data and return
        the fraction we get correct"""
        
        n_right = 0
        for model in self.train_set:
            y_nn = self.predict(model)
            if y_nn == model.y:
                n_right += 1
        return n_right / len(self.train_set)

Let’s create the network and train it.

nn = NeuralNetwork(num_training_unique=1000, data_class=ModelData)
nn.train(n_epochs=100)

We can ask the network how well it does on the data its already seen

frac = nn.check_accuracy()
print(f"fraction correct: {frac}")

fraction correct: 0.104

All that training, and it is only about 10% accurate! Now let’s check it on data its never seen

err = []
npts = 1000
n_right = 0
for k in range(npts):
    model = ModelData()
    y_nn = nn.predict(model)
    if y_nn == model.y:
        n_right += 1
    err.append(abs(y_nn - model.y))
    
print(f"fraction correct: {n_right / npts}")

fraction correct: 0.097

Clearly we are not doing that great. We are getting only 10% right, which is basically random guessing. Let’s look at a single attempt

model = ModelData()
model.x

array([6, 1, 8, 7, 2, 3, 6, 8, 3, 1])

model.y

array([1])

Here’s what the network predicts

nn.predict(model)

And here’s the prediction before calling the activation function

nn.A @ model.x_scaled

array([81.78699033])

Part of the problem here is that the network is return a really large number, and the sigmoid function works best when the prediction is in the region where it varies the fastest, \(\xi \sim [-1, 1]\)

nn.g(nn.A @ model.x_scaled)

array([1.])

Basically whatever we feed it, the signmoid will return 1 when the numbers are this large. We need to scale the data.

Scaled Data#

Let’s try again, but this time, let’s scale the output that we train to by 10 so it falls within \([0.05, 0.95]\). We choose to add a small offset, \(0.05\) to prevent any of the inputs from being \(0\), which will simply cancel out any weight they multiply.

class ModelDataScaled:
    """this is the model data for our "last number" training set.  We
    produce input of length N consisting of the numbers 0-9, and we set
    the output to be simply the last element of the input vector.  This
    version scales the data to lie between [0, 1]

    """    
    def __init__(self, N=10):
        self.N = N
        self.offset = 0.05
        self.scale_factor = 10
        
        # our model input data
        rng = np.random.default_rng()
        self.x = rng.integers(0, high=10, size=N)
        self.x_scaled = self.x / self.scale_factor + self.offset

        # our scaled model output data
        self.y = np.array([self.x[-1]])
        self.y_scaled = np.array([self.x_scaled[-1]])
                                
    def interpret_result(self, out):
        """take the network output and undo the scaling and round it.

        """
        return max(0, min(9, int(self.scale_factor *
                                 np.round(out / self.offset)[0] * self.offset)))

nn = NeuralNetwork(num_training_unique=1000, data_class=ModelDataScaled)
nn.train(n_epochs=100)

nn.check_accuracy()

0.392

err = []
npts = 1000
n_right = 0
for k in range(npts):
    model = ModelDataScaled()
    y_nn = nn.predict(model)
    if y_nn == model.y:
        n_right += 1
    err.append(abs(y_nn - model.y))
    
print(f"fraction correct: {n_right / npts}")

fraction correct: 0.381

We seem to do a lot better now.

Take a look at the trained matrix:

nn.A

array([[-0.52817532, -0.56484798, -0.45576609, -0.48393033, -0.44264274,
        -0.3388461 , -0.4857785 , -0.57055123, -0.47882851,  4.66745315]])

Notice that by far the largest element (in magnitude) is the last one. This makes sense, since we want \({\bf A x}\) to choose the last value in \({\bf x}\).

Categorical Data#

We’ll finish this out by looking at a different way to represent the data. In the previous attempts, the network did not know that it was supposed to predict an integer from 0-9 and instead gives a real number that we round to the nearest integer in that range.

We can instead treat the data as categorical, restricting it to take on only those values.

Let’s create the class first and then we’ll see what the output looks like.

class ModelDataCategorical:
    """this is the model data for our "last number" training set.  We
    produce input of length N, consisting of numbers 0-9 and store
    the result in a 10-element array as categorical data.

    """
    def __init__(self, N=10):
        self.N = N
        
        # our model input data
        rng = np.random.default_rng()
        self.x = rng.integers(0, high=10, size=N)
        self.x_scaled = self.x / 10 + 0.05
        
        # our scaled model output data
        self.y = np.array([self.x[-1]])
        self.y_scaled = np.zeros(10) + 0.01
        self.y_scaled[self.x[-1]] = 0.99
        
    def interpret_result(self, out):
        """take the network output and return the number we predict"""
        return np.argmax(out)

This is categorical data—the answer we are training on is a 10 element array with a “1” in the slot corresponding to the correct integer (we actually use 0.99).

model = ModelDataCategorical()
model.x

array([8, 6, 0, 7, 5, 1, 2, 3, 6, 7])

model.y_scaled

array([0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.99, 0.01, 0.01])

model.y

array([7])

We see that model.y_scaled[model.y] = 0.99. We will pick as the correct answer the array index that has the highest value.

One additional benefit of this is that our matrix \(A\) will now be \(10\times 10\), so there are more weights to train.

nn = NeuralNetwork(num_training_unique=1000, data_class=ModelDataCategorical)
nn.train(n_epochs=100)

nn.check_accuracy()

0.389

err = []
npts = 1000
n_right = 0
for k in range(npts):
    model = ModelDataCategorical()
    y_nn = nn.predict(model)
    if y_nn == model.y:
        n_right += 1
    err.append(abs(y_nn - model.y))
    
print(f"fraction correct: {n_right / npts}")

fraction correct: 0.353

This network seems to do well—we can get ~ 1/3rd of the new data correct. It helps that there are a lot more connections that we can train. We note however that the sigmoid function is not the best choice for categorical data.

Explorations#

There are several parameters that we can play with:

Size of the training data set
Number of epochs
Value of the learning rate, \(\eta\)

Try playing with these and see how the accuacy changes.

A First Example

Contents

A First Example#

Scaled Data#

Categorical Data#

Explorations#