Clustering

Clustering#

Clustering seeks to group data into clusters based on their properties and then allow us to predict which cluster a new member belongs.

We’ll use a dataset generator that is part of scikit-learn called make_moons. This generates data that falls into 2 different sets with a shape that looks like half-moons.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
def generate_data():
    xvec, val = datasets.make_moons(200, noise=0.2)

    # encode the output to be 2 elements
    x = []
    v = []
    for xv, vv in zip(xvec, val):
        x.append(np.array(xv))
        v.append(vv)

    return np.array(x), np.array(v)
x, v = generate_data()

Let’s look at a point and it’s value

print(f"x = {x[0]}, value = {v[0]}")
x = [2.06545706 0.22292211], value = 1

Now let’s plot the data

def plot_data(x, v):
    xpt = [q[0] for q in x]
    ypt = [q[1] for q in x]

    fig, ax = plt.subplots()
    ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
    ax.set_aspect("equal")
    return fig
fig = plot_data(x, v)
../_images/6b4c40b188b262194fe562f4733fb3420ca3512c8d2bbfc02a046f9e8ab12e12.png

We want to partition this domain into 2 regions, such that when we come in with a new point, we know which group it belongs to.

First we setup and train our network

from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from tensorflow.keras.optimizers import RMSprop
2024-11-20 14:15:22.237984: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-11-20 14:15:22.241477: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-11-20 14:15:22.248516: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1732112122.262901    3654 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1732112122.267189    3654 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-20 14:15:22.283932: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[7], line 2
      1 from keras.models import Sequential
----> 2 from keras.layers.core import Dense, Dropout, Activation
      3 from tensorflow.keras.optimizers import RMSprop

ModuleNotFoundError: No module named 'keras.layers.core'
model = Sequential()
model.add(Dense(50, input_dim=2, activation="relu"))
model.add(Dense(20, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
rms = RMSprop()
model.compile(loss='binary_crossentropy',
              optimizer=rms, metrics=['accuracy'])
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot

SVG(model_to_dot(model, show_shapes=True, dpi=65).create(prog='dot', format='svg'))
../_images/23a6f7e3961a869975ff60bb9d6b81b935c5779c47c1dfa072b1ff15f144e427.svg

We seem to need a lot of epochs here to get a good result

epochs = 100
results = model.fit(x, v, batch_size=50, epochs=epochs)
Epoch 1/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1941 - accuracy: 0.9300
Epoch 2/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1910 - accuracy: 0.9350
Epoch 3/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1905 - accuracy: 0.9350
Epoch 4/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1905 - accuracy: 0.9350
Epoch 5/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1889 - accuracy: 0.9350
Epoch 6/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1866 - accuracy: 0.9350
Epoch 7/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1845 - accuracy: 0.9350
Epoch 8/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1828 - accuracy: 0.9350
Epoch 9/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1810 - accuracy: 0.9400
Epoch 10/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1795 - accuracy: 0.9350
Epoch 11/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1810 - accuracy: 0.9400
Epoch 12/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1764 - accuracy: 0.9400
Epoch 13/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1789 - accuracy: 0.9400
Epoch 14/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1743 - accuracy: 0.9300
Epoch 15/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1724 - accuracy: 0.9300
Epoch 16/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1747 - accuracy: 0.9300
Epoch 17/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1708 - accuracy: 0.9350
Epoch 18/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1738 - accuracy: 0.9350
Epoch 19/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1693 - accuracy: 0.9350
Epoch 20/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1686 - accuracy: 0.9350
Epoch 21/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1669 - accuracy: 0.9350
Epoch 22/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1647 - accuracy: 0.9400
Epoch 23/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1637 - accuracy: 0.9400
Epoch 24/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1629 - accuracy: 0.9400
Epoch 25/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1609 - accuracy: 0.9400
Epoch 26/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1605 - accuracy: 0.9450
Epoch 27/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1590 - accuracy: 0.9400
Epoch 28/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1584 - accuracy: 0.9400
Epoch 29/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1557 - accuracy: 0.9450
Epoch 30/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1570 - accuracy: 0.9400
Epoch 31/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1537 - accuracy: 0.9450
Epoch 32/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1539 - accuracy: 0.9450
Epoch 33/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1517 - accuracy: 0.9450
Epoch 34/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1512 - accuracy: 0.9500
Epoch 35/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1502 - accuracy: 0.9400
Epoch 36/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1493 - accuracy: 0.9400
Epoch 37/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1479 - accuracy: 0.9450
Epoch 38/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1495 - accuracy: 0.9450
Epoch 39/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1463 - accuracy: 0.9450
Epoch 40/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1453 - accuracy: 0.9450
Epoch 41/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1432 - accuracy: 0.9450
Epoch 42/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1442 - accuracy: 0.9450
Epoch 43/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1422 - accuracy: 0.9450
Epoch 44/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1407 - accuracy: 0.9450
Epoch 45/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1392 - accuracy: 0.9500
Epoch 46/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1380 - accuracy: 0.9500
Epoch 47/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1386 - accuracy: 0.9450
Epoch 48/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1410 - accuracy: 0.9400
Epoch 49/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1357 - accuracy: 0.9500
Epoch 50/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1361 - accuracy: 0.9500
Epoch 51/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1346 - accuracy: 0.9400
Epoch 52/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1330 - accuracy: 0.9450
Epoch 53/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1340 - accuracy: 0.9500
Epoch 54/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1333 - accuracy: 0.9500
Epoch 55/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1305 - accuracy: 0.9450
Epoch 56/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1311 - accuracy: 0.9450
Epoch 57/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1312 - accuracy: 0.9500
Epoch 58/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1307 - accuracy: 0.9400
Epoch 59/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1281 - accuracy: 0.9450
Epoch 60/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1320 - accuracy: 0.9500
Epoch 61/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1285 - accuracy: 0.9450
Epoch 62/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1286 - accuracy: 0.9500
Epoch 63/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1257 - accuracy: 0.9450
Epoch 64/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1252 - accuracy: 0.9450
Epoch 65/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1243 - accuracy: 0.9450
Epoch 66/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1263 - accuracy: 0.9450
Epoch 67/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1243 - accuracy: 0.9500
Epoch 68/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1235 - accuracy: 0.9450
Epoch 69/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1223 - accuracy: 0.9450
Epoch 70/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1209 - accuracy: 0.9450
Epoch 71/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1202 - accuracy: 0.9450
Epoch 72/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1203 - accuracy: 0.9450
Epoch 73/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1216 - accuracy: 0.9550
Epoch 74/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1187 - accuracy: 0.9450
Epoch 75/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1183 - accuracy: 0.9550
Epoch 76/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1181 - accuracy: 0.9500
Epoch 77/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1172 - accuracy: 0.9450
Epoch 78/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1178 - accuracy: 0.9500
Epoch 79/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1201 - accuracy: 0.9450
Epoch 80/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1161 - accuracy: 0.9500
Epoch 81/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1146 - accuracy: 0.9500
Epoch 82/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1145 - accuracy: 0.9500
Epoch 83/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1149 - accuracy: 0.9500
Epoch 84/100
4/4 [==============================] - 0s 7ms/step - loss: 0.1149 - accuracy: 0.9450
Epoch 85/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1146 - accuracy: 0.9500
Epoch 86/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1146 - accuracy: 0.9500
Epoch 87/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1136 - accuracy: 0.9500
Epoch 88/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1121 - accuracy: 0.9500
Epoch 89/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1128 - accuracy: 0.9500
Epoch 90/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1111 - accuracy: 0.9500
Epoch 91/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1120 - accuracy: 0.9500
Epoch 92/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1109 - accuracy: 0.9500
Epoch 93/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1113 - accuracy: 0.9500
Epoch 94/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1109 - accuracy: 0.9500
Epoch 95/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1089 - accuracy: 0.9500
Epoch 96/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1090 - accuracy: 0.9500
Epoch 97/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1096 - accuracy: 0.9500
Epoch 98/100
4/4 [==============================] - 0s 2ms/step - loss: 0.1093 - accuracy: 0.9500
Epoch 99/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1085 - accuracy: 0.9500
Epoch 100/100
4/4 [==============================] - 0s 1ms/step - loss: 0.1075 - accuracy: 0.9500
score = model.evaluate(x, v, verbose=0)
print(f"score = {score[0]}")
print(f"accuracy = {score[1]}")
score = 0.10588617622852325
accuracy = 0.949999988079071

Let’s look at a prediction. We need to feed in a single point as an array of shape (N, 2), where N is the number of points

res = model.predict(np.array([[-2, 2]]))
res
1/1 [==============================] - 0s 17ms/step
array([[4.807813e-18]], dtype=float32)

We see that we get a floating point number. We will need to convert this to 0 or 1 by rounding.

Let’s plot the partitioning

M = 128
N = 128

xmin = -1.75
xmax = 2.5
ymin = -1.25
ymax = 1.75

xpt = np.linspace(xmin, xmax, M)
ypt = np.linspace(ymin, ymax, N)

To make the prediction go faster, we want to feed in a vector of these points, of the form:

[[xpt[0], ypt[0]],
 [xpt[1], ypt[1]],
 ...
]

We can see that this packs them into the vector

pairs = np.array(np.meshgrid(xpt, ypt)).T.reshape(-1, 2)
pairs[0]
array([-1.75, -1.25])

Now we do the prediction. We will get a vector out, which we reshape to match the original domain.

res = model.predict(pairs, verbose=0)
res.shape = (M, N)

Finally, round to 0 or 1

domain = np.where(res > 0.5, 1, 0)

and we can plot the data

fig, ax = plt.subplots()
ax.imshow(domain.T, origin="lower",
          extent=[xmin, xmax, ymin, ymax], alpha=0.25)
xpt = [q[0] for q in x]
ypt = [q[1] for q in x]

ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
<matplotlib.collections.PathCollection at 0x7f2972d1e190>
../_images/ac466189d6cfbad446696e227c0a3fbac32a7bb6cc810b2cb9e42b6889173350.png