KERAS and MNIST#
Let’s start by trying to reproduce our MNIST example using Keras. Because there are a lot more options in Keras, we can add more layers and different activation functions.
Important
Keras requires a backend, which can be tensorflow, pytorch, or jax. By default it will assume tensorflow.
To use pytorch instead, set the environment variable:
export KERAS_BACKEND="torch"
before launching Jupyter.
import keras
import matplotlib.pyplot as plt
import numpy as np
We follow the example for setting up the network: Vict0rSch/deep_learning
The MNIST data#
The keras library can download the MNIST data directly and provides a function to give us both the training and test images and the corresponding digits. This is already in a format that Keras wants, so we don’t use the classes that we defined earlier.
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
0/11490434 ━━━━━━━━━━━━━━━━━━━━ 0s 0s/step
4202496/11490434 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
11490434/11490434 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
As before, the training set consists of 60000 digits represented as a 28x28 array (there are no color channels, so this is grayscale data). They are also integer data.
X_train.shape
(60000, 28, 28)
X_train.dtype
dtype('uint8')
Let’s look at the first digit and the “y” value (target) associated with it—that’s the correct answer.
plt.imshow(X_train[0], cmap="gray_r")
print(y_train[0])
5
Preparing the Data#
The neural network takes a 1-d vector of input and will return a 1-d vector of output. We need to convert our data to this form.
We’ll scale the image data to fall in [0, 1) and the numerical output to be categorized as an array. Finally, we need the input data to be one-dimensional, so we fill flatten the 28x28 images into a single 784 vector.
X_train = X_train.astype('float32')/255
X_test = X_test.astype('float32')/255
X_train = np.reshape(X_train, (60000, 784))
X_test = np.reshape(X_test, (10000, 784))
X_train[0]
array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.01176471, 0.07058824, 0.07058824,
0.07058824, 0.49411765, 0.53333336, 0.6862745 , 0.10196079,
0.6509804 , 1. , 0.96862745, 0.49803922, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.11764706, 0.14117648, 0.36862746, 0.6039216 ,
0.6666667 , 0.99215686, 0.99215686, 0.99215686, 0.99215686,
0.99215686, 0.88235295, 0.6745098 , 0.99215686, 0.9490196 ,
0.7647059 , 0.2509804 , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.19215687, 0.93333334,
0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.99215686,
0.99215686, 0.99215686, 0.99215686, 0.9843137 , 0.3647059 ,
0.32156864, 0.32156864, 0.21960784, 0.15294118, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.07058824, 0.85882354, 0.99215686, 0.99215686,
0.99215686, 0.99215686, 0.99215686, 0.7764706 , 0.7137255 ,
0.96862745, 0.94509804, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.3137255 , 0.6117647 , 0.41960785, 0.99215686, 0.99215686,
0.8039216 , 0.04313726, 0. , 0.16862746, 0.6039216 ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.05490196,
0.00392157, 0.6039216 , 0.99215686, 0.3529412 , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.54509807,
0.99215686, 0.74509805, 0.00784314, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.04313726, 0.74509805, 0.99215686,
0.27450982, 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.13725491, 0.94509804, 0.88235295, 0.627451 ,
0.42352942, 0.00392157, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.31764707, 0.9411765 , 0.99215686, 0.99215686, 0.46666667,
0.09803922, 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.1764706 ,
0.7294118 , 0.99215686, 0.99215686, 0.5882353 , 0.10588235,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.0627451 , 0.3647059 ,
0.9882353 , 0.99215686, 0.73333335, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.9764706 , 0.99215686,
0.9764706 , 0.2509804 , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.18039216, 0.50980395,
0.7176471 , 0.99215686, 0.99215686, 0.8117647 , 0.00784314,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.15294118,
0.5803922 , 0.8980392 , 0.99215686, 0.99215686, 0.99215686,
0.98039216, 0.7137255 , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.09411765, 0.44705883, 0.8666667 , 0.99215686, 0.99215686,
0.99215686, 0.99215686, 0.7882353 , 0.30588236, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.09019608, 0.25882354, 0.8352941 , 0.99215686,
0.99215686, 0.99215686, 0.99215686, 0.7764706 , 0.31764707,
0.00784314, 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.07058824, 0.67058825, 0.85882354,
0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.7647059 ,
0.3137255 , 0.03529412, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.21568628, 0.6745098 ,
0.8862745 , 0.99215686, 0.99215686, 0.99215686, 0.99215686,
0.95686275, 0.52156866, 0.04313726, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.53333336, 0.99215686, 0.99215686, 0.99215686,
0.83137256, 0.5294118 , 0.5176471 , 0.0627451 , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ], dtype=float32)
As we did in our example, we will use categorical data. Keras includes routines to categorize data. In our case, since there are 10 possible digits, we want to put the output into 10 categories (represented by 10 neurons)
from keras.utils import to_categorical
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
Now let’s look at the target for the first training digit. We know from above that it was ‘5’. Here we see that there is a 1 in the index corresponding to 5 (remember we start counting at 0 in python).
y_train[0]
array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.])
Build the Neural Network#
Now we’ll build the neural network. We will have 2 hidden layers, and the number of neurons will look like:
784 → 500 → 300 → 10
Layers#
Let’s start by setting up the layers. For each layer, we tell keras the number of output neurons. It infers the number of inputs from the previous layer (with the exception of the input layer, where we need to tell it what to expect as input).
Properties on the layers:
Input layer: this just tells the network how many input nodes to expect.
Dense layers: We will use a dense network. This means that all neurons in one layer are connected to all neurons in the next layer (sometimes the term “fully-connected” is used here).
Activation function: We previously used the sigmoid function. Now we’ll use rectified linear unit (see also http://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html#relu) for all but the last layer.
For the very last layer (the output layer), we use a softmax activation. This is commonly used with categorical data (like we have) and has the nice property that all of entries add to 1 (so we can interpret them as probabilities).
See https://keras.io/api/layers/activations/ for a list of activation functions supported.
Dropout: for some of the layers, we will specify a dropout. This means that we will ignore some of the neurons in a layer during training (randomly selected at the specified probability). This can help present overfitting of the network.
Here’s a nice discussion: https://medium.com/@amarbudhiraja/https-medium-com-amarbudhiraja-learning-less-to-learn-better-dropout-in-deep-machine-learning-74334da4bfc5
from keras.models import Sequential
from keras.layers import Input, Dense, Dropout, Activation
model = Sequential()
model.add(Input(shape=(784,)))
model.add(Dense(500, activation="relu"))
model.add(Dropout(0.4))
model.add(Dense(300, activation="relu"))
model.add(Dropout(0.4))
model.add(Dense(10, activation="softmax"))
Loss function#
We need to specify what we want to optimize and how we are going to do it.
Recall: the loss (or cost) function measures how well our predictions match the expected target. Previously we were using the sum of the squares of the error.
For categorical data, like we have, the “cross-entropy” metric is often used. See here for an explanation: https://jamesmccaffrey.wordpress.com/2013/11/05/why-you-should-use-cross-entropy-error-instead-of-classification-error-or-mean-squared-error-for-neural-network-classifier-training/
Optimizer#
We also need to specify an optimizer. This could be gradient descent, as we used before. Here’s a list of the optimizers supoprted by keras: https://keras.io/api/optimizers/ We’ll use RMPprop, which builds off of gradient descent and includes some momentum.
Finally, we need to specify a metric that is evaluated during training and testing. We’ll use "accuracy" here. This means that we’ll see the accuracy of our model reported as we are training and testing.
More details on these options is here: https://keras.io/api/models/model/
from keras.optimizers import RMSprop
rms = RMSprop()
model.compile(loss='categorical_crossentropy',
optimizer=rms, metrics=['accuracy'])
Summary#
Finally, we can get a summary of the model:
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 500) │ 392,500 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout (Dropout) │ (None, 500) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 300) │ 150,300 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_1 (Dropout) │ (None, 300) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_2 (Dense) │ (None, 10) │ 3,010 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 545,810 (2.08 MB)
Trainable params: 545,810 (2.08 MB)
Non-trainable params: 0 (0.00 B)
We see that we have > 500k parameters to train!
Train#
For training, we pass in the inputs and target and the number of epochs to run and it will optimize the network by adjusting the weights between the nodes in the layers.
The number of epochs is the number of times the entire data set is passed forward and backward through the network. The batch size is the number of training pairs you pass through the network at a given time. You update the parameter in your model (the weights) once for each batch. This makes things more efficient and less noisy.
Tip
We also pass in the test data as “validation” which will allow us to see how well we are doing as we train.
epochs = 20
batch_size = 256
model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size,
validation_data=(X_test, y_test), verbose=2)
Epoch 1/20
235/235 - 4s - 16ms/step - accuracy: 0.8854 - loss: 0.3725 - val_accuracy: 0.9512 - val_loss: 0.1584
Epoch 2/20
235/235 - 4s - 16ms/step - accuracy: 0.9508 - loss: 0.1613 - val_accuracy: 0.9675 - val_loss: 0.1043
Epoch 3/20
235/235 - 4s - 15ms/step - accuracy: 0.9646 - loss: 0.1188 - val_accuracy: 0.9736 - val_loss: 0.0844
Epoch 4/20
235/235 - 4s - 16ms/step - accuracy: 0.9705 - loss: 0.0969 - val_accuracy: 0.9788 - val_loss: 0.0723
Epoch 5/20
235/235 - 4s - 16ms/step - accuracy: 0.9750 - loss: 0.0820 - val_accuracy: 0.9760 - val_loss: 0.0785
Epoch 6/20
235/235 - 4s - 15ms/step - accuracy: 0.9777 - loss: 0.0711 - val_accuracy: 0.9797 - val_loss: 0.0633
Epoch 7/20
235/235 - 4s - 15ms/step - accuracy: 0.9799 - loss: 0.0642 - val_accuracy: 0.9815 - val_loss: 0.0646
Epoch 8/20
235/235 - 4s - 15ms/step - accuracy: 0.9814 - loss: 0.0575 - val_accuracy: 0.9805 - val_loss: 0.0647
Epoch 9/20
235/235 - 4s - 16ms/step - accuracy: 0.9835 - loss: 0.0543 - val_accuracy: 0.9834 - val_loss: 0.0614
Epoch 10/20
235/235 - 4s - 16ms/step - accuracy: 0.9842 - loss: 0.0498 - val_accuracy: 0.9824 - val_loss: 0.0599
Epoch 11/20
235/235 - 4s - 16ms/step - accuracy: 0.9853 - loss: 0.0444 - val_accuracy: 0.9851 - val_loss: 0.0584
Epoch 12/20
235/235 - 4s - 16ms/step - accuracy: 0.9863 - loss: 0.0437 - val_accuracy: 0.9852 - val_loss: 0.0528
Epoch 13/20
235/235 - 4s - 15ms/step - accuracy: 0.9868 - loss: 0.0407 - val_accuracy: 0.9839 - val_loss: 0.0608
Epoch 14/20
235/235 - 4s - 15ms/step - accuracy: 0.9883 - loss: 0.0375 - val_accuracy: 0.9837 - val_loss: 0.0582
Epoch 15/20
235/235 - 4s - 15ms/step - accuracy: 0.9887 - loss: 0.0339 - val_accuracy: 0.9841 - val_loss: 0.0661
Epoch 16/20
235/235 - 4s - 15ms/step - accuracy: 0.9896 - loss: 0.0330 - val_accuracy: 0.9841 - val_loss: 0.0613
Epoch 17/20
235/235 - 4s - 15ms/step - accuracy: 0.9895 - loss: 0.0322 - val_accuracy: 0.9846 - val_loss: 0.0582
Epoch 18/20
235/235 - 4s - 15ms/step - accuracy: 0.9904 - loss: 0.0301 - val_accuracy: 0.9846 - val_loss: 0.0632
Epoch 19/20
235/235 - 4s - 15ms/step - accuracy: 0.9905 - loss: 0.0286 - val_accuracy: 0.9849 - val_loss: 0.0614
Epoch 20/20
235/235 - 4s - 15ms/step - accuracy: 0.9914 - loss: 0.0274 - val_accuracy: 0.9855 - val_loss: 0.0644
<keras.src.callbacks.history.History at 0x7f7671d6e3c0>
Test#
keras has a routine, evaluate() that can take the inputs and targets of a test data set and return the loss value and accuracy (or other defined metrics) on this data.
Here we see we are > 98% accurate on the test data—this is the data that the model has never seen before (and was not trained with).
loss_value, accuracy = model.evaluate(X_test, y_test, batch_size=16)
print(accuracy)
1/625 ━━━━━━━━━━━━━━━━━━━━ 4s 7ms/step - accuracy: 1.0000 - loss: 0.0011
11/625 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.9965 - loss: 0.0093
21/625 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.9933 - loss: 0.0286
31/625 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.9904 - loss: 0.0481
41/625 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.9887 - loss: 0.0584
51/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9877 - loss: 0.0644
61/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9872 - loss: 0.0678
71/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9867 - loss: 0.0710
81/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9861 - loss: 0.0749
91/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9854 - loss: 0.0784
101/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9847 - loss: 0.0815
111/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9843 - loss: 0.0839
122/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9839 - loss: 0.0857
132/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9836 - loss: 0.0869
142/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9832 - loss: 0.0882
152/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9829 - loss: 0.0894
162/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9826 - loss: 0.0903
172/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9824 - loss: 0.0912
182/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9822 - loss: 0.0918
192/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9821 - loss: 0.0924
202/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9819 - loss: 0.0929
212/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9819 - loss: 0.0931
222/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9818 - loss: 0.0932
232/625 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9817 - loss: 0.0935
242/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9816 - loss: 0.0937
252/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9816 - loss: 0.0939
263/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9815 - loss: 0.0942
272/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9814 - loss: 0.0943
280/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9814 - loss: 0.0944
287/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9813 - loss: 0.0944
297/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9813 - loss: 0.0945
307/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9813 - loss: 0.0945
317/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9812 - loss: 0.0945
328/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9812 - loss: 0.0944
338/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9812 - loss: 0.0942
349/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9813 - loss: 0.0939
360/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9813 - loss: 0.0937
371/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9813 - loss: 0.0934
381/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9813 - loss: 0.0931
391/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9813 - loss: 0.0929
401/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9814 - loss: 0.0926
412/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9814 - loss: 0.0923
422/625 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9814 - loss: 0.0920
432/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9815 - loss: 0.0917
442/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9815 - loss: 0.0914
452/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9815 - loss: 0.0911
463/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9816 - loss: 0.0907
474/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9816 - loss: 0.0903
485/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9817 - loss: 0.0899
496/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9818 - loss: 0.0894
507/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9818 - loss: 0.0890
517/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9819 - loss: 0.0886
527/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9819 - loss: 0.0882
536/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9820 - loss: 0.0879
546/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9820 - loss: 0.0875
555/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9821 - loss: 0.0871
565/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9822 - loss: 0.0867
575/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9822 - loss: 0.0864
586/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9823 - loss: 0.0859
596/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9824 - loss: 0.0855
606/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9824 - loss: 0.0852
616/625 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9825 - loss: 0.0848
625/625 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.9855 - loss: 0.0644
0.9854999780654907
Predicting#
Suppose we simply want to ask our neural network to predict the target for an input. We can use the predict() method to return the category array with the predictions. We can then use np.argmax() to select the most probable.
np.argmax(model.predict(np.array([X_test[0]])))
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step
np.int64(7)
y_test[0]
array([0., 0., 0., 0., 0., 0., 0., 1., 0., 0.])
Now let’s loop over the test set and print out what we predict vs. the true answer for those we get wrong. We can also plot the image of the digit.
wrong = 0
max_wrong = 10
for n, (x, y) in enumerate(zip(X_test, y_test)):
try:
res = model.predict(np.array([x]), verbose=0)
if np.argmax(res) != np.argmax(y):
print("test {}: prediction = {}, truth is {}".format(n, np.argmax(res), np.argmax(y)))
plt.imshow(x.reshape(28, 28), cmap="gray_r")
plt.show()
wrong += 1
if (wrong > max_wrong-1):
break
except KeyboardInterrupt:
print("stopping")
break
test 115: prediction = 9, truth is 4
test 151: prediction = 8, truth is 9
test 247: prediction = 2, truth is 4
test 321: prediction = 7, truth is 2
test 340: prediction = 3, truth is 5
test 381: prediction = 7, truth is 3
test 445: prediction = 0, truth is 6
test 447: prediction = 9, truth is 4
test 495: prediction = 2, truth is 8
test 582: prediction = 2, truth is 8
Experimenting#
There are a number of things we can play with to see how the network performance changes:
batch size
adding or removing hidden layers
changing the dropout
changing the activation function
Callbacks#
Keras allows for callbacks each epoch to store some information. These can allow you to, for example, plot of the accuracy vs. epoch by adding a callback. Take a look here for some inspiration:
https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/History
Going Further#
Convolutional neural networks are often used for image recognition, especially with larger images. They use filter to try to recognize patterns in portions of images (A tile). See this for a keras example: