A First Example#
Let’s do a simple example: given a sequence of numbers as input, we want the last number in the sequence to be the output. Let’s see if we can train a network to recognize this.
import numpy as np
import random
We’ll create a ModelData
class that generates our data and also has a method interpret_result()
that allows us to pass in the result of the network and tell us
which of the number it predicts. This is needed since the network will not know
that we are dealing with integers, so we’ll need to round it.
The important members are:
ModelData.x
: the inputModelData.y
: the output
Anticipating the need to scale our data later, we’ll also create a ModelData.x_scaled
which can provide a scaling to the input and ModelData.y_scaled
which will be the scaled
output.
class ModelData:
"""this is the model data for our "last number" training set. We
produce input of length N consisting of the numbers 0-9, and we set
the output to be simply the last element of the input vector
"""
def __init__(self, N=10):
self.N = N
# our model input data
self.x = np.random.randint(0, high=10, size=N)
self.x_scaled = self.x
# our scaled model output data
self.y = np.array([self.x[-1]])
self.y_scaled = np.array([self.x_scaled[-1]])
def interpret_result(self, out):
"""take the network output and return a the number from the allowed
sequence we are closest to
"""
return max(0, min(9, int(np.round(out, decimals=0))))
Here’s what our data looks like:
model = ModelData()
model.x
array([1, 5, 0, 9, 1, 6, 4, 8, 7, 3])
model.y
array([3])
Now we write our network. We’ll make it take the name of the class that will create the data so we can reuse this network with different variations.
Some implementation details:
The network will only deal with
model.x_scaled
andmodel.y_scaled
.We need to initialize the matrix \({\bf A}\)—we’ll use Gaussian random numbers, centered on 0 with a width set to \(\sqrt{N_\mathrm{in}}\). This seems to be a common choice.
We’ll loop over the data in the training set in a random order (randomizing each epoch).
class NeuralNetwork:
def __init__(self, num_training_unique=100, data_class=None):
self.num_training_unique = num_training_unique
self.train_set = []
for _ in range(self.num_training_unique):
self.train_set.append(data_class())
# initialize our matrix with Gaussian normal random numbers
# we get the size from the length of the input and output
model = self.train_set[0]
self.N_out = len(model.y_scaled)
self.N_in = len(model.x_scaled)
self.A = np.random.normal(0.0, 1.0/np.sqrt(self.N_in),
(self.N_out, self.N_in))
def g(self, xi):
"""our sigmoid function"""
return 1.0/(1.0 + np.exp(-xi))
def train(self, n_epochs=10, eta=0.2):
"""Do the minimization for the training"""
# train
for _ in range(n_epochs):
random.shuffle(self.train_set)
for model in self.train_set:
# gradient descent -- just a single improvement. eta
# here is our learning rate
# make these column vectors
x = model.x_scaled.reshape(self.N_in, 1)
y = model.y_scaled.reshape(self.N_out, 1)
b = self.A @ x
z = self.g(b)
self.A[:,:] += -eta * 2 * (z - y) * z * (1 - z) @ x.T
def predict(self, model):
"""predict the outcome using our trained matrix A """
z = self.g(self.A @ model.x_scaled)
return model.interpret_result(z)
def check_accuracy(self):
"""use the trained network on the training data and return
the fraction we get correct"""
n_right = 0
for model in self.train_set:
y_nn = self.predict(model)
if y_nn == model.y:
n_right += 1
return n_right / len(self.train_set)
Let’s create the network and train it.
nn = NeuralNetwork(num_training_unique=1000, data_class=ModelData)
nn.train(n_epochs=100)
We can ask the network how well it does on the data its already seen
frac = nn.check_accuracy()
print(f"fraction correct: {frac}")
fraction correct: 0.097
/tmp/ipykernel_4482/1040616222.py:23: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
return max(0, min(9, int(np.round(out, decimals=0))))
All that training, and it is only about 10% accurate! Now let’s check it on data its never seen
err = []
npts = 1000
n_right = 0
for k in range(npts):
model = ModelData()
y_nn = nn.predict(model)
if y_nn == model.y:
n_right += 1
err.append(abs(y_nn - model.y))
print(f"fraction correct: {n_right / npts}")
fraction correct: 0.097
/tmp/ipykernel_4482/1040616222.py:23: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
return max(0, min(9, int(np.round(out, decimals=0))))
Clearly we are not doing that great. We are getting only 10% right, which is basically random guessing. Let’s look at a single attempt
model = ModelData()
model.x
array([7, 3, 4, 3, 3, 2, 4, 3, 7, 2])
model.y
array([2])
Here’s what the network predicts
nn.predict(model)
/tmp/ipykernel_4482/1040616222.py:23: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
return max(0, min(9, int(np.round(out, decimals=0))))
1
And here’s the prediction before calling the activation function
nn.A @ model.x_scaled
array([49.62650686])
Part of the problem here is that the network is return a really large number, and the sigmoid function works best when the prediction is in the region where it varies the fastest, \(\xi \sim [-1, 1]\)
nn.g(nn.A @ model.x_scaled)
array([1.])
Basically whatever we feed it, the signmoid will return 1 when the numbers are this large. We need to scale the data.
Scaled Data#
Let’s try again, but this time, let’s scale the output that we train to by 10 so it falls within \([0.05, 0.95]\). We choose to add a small offset, \(0.05\) to prevent any of the inputs from being \(0\), which will simply cancel out any weight they multiply.
class ModelDataScaled:
"""this is the model data for our "last number" training set. We
produce input of length N consisting of the numbers 0-9, and we set
the output to be simply the last element of the input vector. This
version scales the data to lie between [0, 1]
"""
def __init__(self, N=10):
self.N = N
self.offset = 0.05
self.scale_factor = 10
# our model input data
self.x = np.random.randint(0, high=10, size=N)
self.x_scaled = self.x / self.scale_factor + self.offset
# our scaled model output data
self.y = np.array([self.x[-1]])
self.y_scaled = np.array([self.x_scaled[-1]])
def interpret_result(self, out):
"""take the network output and undo the scaling and round it.
"""
return max(0, min(9, int(self.scale_factor *
np.round(out / self.offset) * self.offset)))
nn = NeuralNetwork(num_training_unique=1000, data_class=ModelDataScaled)
nn.train(n_epochs=100)
nn.check_accuracy()
/tmp/ipykernel_4482/3527878575.py:25: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
return max(0, min(9, int(self.scale_factor *
0.387
err = []
npts = 1000
n_right = 0
for k in range(npts):
model = ModelDataScaled()
y_nn = nn.predict(model)
if y_nn == model.y:
n_right += 1
err.append(abs(y_nn - model.y))
print(f"fraction correct: {n_right / npts}")
fraction correct: 0.427
/tmp/ipykernel_4482/3527878575.py:25: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
return max(0, min(9, int(self.scale_factor *
We seem to do a lot better now.
Take a look at the trained matrix:
nn.A
array([[-0.47367986, -0.568422 , -0.4472484 , -0.51659962, -0.49816483,
-0.47546351, -0.46584516, -0.44743518, -0.51832001, 4.52393044]])
Notice that by far the largest element (in magnitude) is the last one. This makes sense, since we want \({\bf A x}\) to choose the last value in \({\bf x}\).
Categorical Data#
We’ll finish this out by looking at a different way to represent the data. In the previous attempts, the network did not know that it was supposed to predict an integer from 0-9 and instead gives a real number that we round to the nearest integer in that range.
We can instead treat the data as categorical, restricting it to take on only those values.
Let’s create the class first and then we’ll see what the output looks like.
class ModelDataCategorical:
"""this is the model data for our "last number" training set. We
produce input of length N, consisting of numbers 0-9 and store
the result in a 10-element array as categorical data.
"""
def __init__(self, N=10):
self.N = N
# our model input data
self.x = np.random.randint(0, high=10, size=N)
self.x_scaled = self.x / 10 + 0.05
# our scaled model output data
self.y = np.array([self.x[-1]])
self.y_scaled = np.zeros(10) + 0.01
self.y_scaled[self.x[-1]] = 0.99
def interpret_result(self, out):
"""take the network output and return the number we predict"""
return np.argmax(out)
This is categorical data—the answer we are training on is a 10 element array with a “1” in the slot corresponding to the correct integer (we actually use 0.99).
model = ModelDataCategorical()
model.x
array([9, 9, 8, 6, 2, 4, 1, 3, 9, 6])
model.y_scaled
array([0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.99, 0.01, 0.01, 0.01])
model.y
array([6])
We see that model.y_scaled[model.y] = 0.99
. We will pick as the correct answer
the array index that has the highest value.
One additional benefit of this is that our matrix \(A\) will now be \(10\times 10\), so there are more weights to train.
nn = NeuralNetwork(num_training_unique=1000, data_class=ModelDataCategorical)
nn.train(n_epochs=100)
nn.check_accuracy()
0.377
err = []
npts = 1000
n_right = 0
for k in range(npts):
model = ModelDataCategorical()
y_nn = nn.predict(model)
if y_nn == model.y:
n_right += 1
err.append(abs(y_nn - model.y))
print(f"fraction correct: {n_right / npts}")
fraction correct: 0.329
This network seems to do well—we can get ~ 1/3rd of the new data correct. It helps that there are a lot more connections that we can train. We note however that the sigmoid function is not the best choice for categorical data.
Explorations#
There are several parameters that we can play with:
Size of the training data set
Number of epochs
Value of the learning rate, \(\eta\)
Try playing with these and see how the accuacy changes.