Tools¶

# if you're using colab, then install the required modules
import sys

IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    %pip install --quiet --upgrade pytorch-lightning

Overview¶

There is huge variety of machine learning and deep learning tools.

In this course, we’ll focus on:

scikit-learn
TensorFlow
PyTorch

The tool you choose depends many considerations, for example:

Your research problem
Model availability (e.g., pre-trained, state-of-the-art)
Ecosystem (e.g., compatibility with other tools)
Personal preferences
Deployment (e.g., hardware)

There are many discussions on the different choices e.g., 1, 2.

scikit-learn ¶

Scikit-learn has a wide range of simple and efficient classic machine learning tools.

There are ones for:

Linear Models (examples)
- A set of methods where the output is a linear combination of the inputs.
- For example, fitting a straight line to the data using Linear Regression (also known as ordinary least squares).
Nearest Neighbours (examples)
- Find a (pre-defined) number of training samples closest in distance to the new point, and predict the label from these.
- The number of samples can be defined in different ways.
- There are various measures of distance.
- For example, classifying labels based on their closeness to other samples in Nearest Neighbor Classification.
Support Vector Machines (examples)
- Place a decision function (i.e., the support vector) between data points to classify, regress, or find outliers.
- For example, find the two hardest to categorise samples and place a decision boundary between them in Support Vector Classification.
Decision trees (examples)
- Predict the value of a target variable by learning simple decision rules inferred from the data features.
- Many decisions are grouped together into a tree.
- For example, many decision trees together in an ensemble is a Random Forest.
And many more.

TensorFlow ¶

Tensorflow is an end-to-end open source machine learning platform.

TensorFlow has a user-friendly, high-level API (Application Programming Interface) called Keras.

Keras includes a wide range of high-level objects (tutorials) including:

Models
Layers
- Activations e.g., tf.keras.activations.sigmoid
- Regularisers e.g., tf.keras.regularizers.l2
- Convolutional e.g., tf.keras.layers.Conv2D
- Recurrent e.g., tf.keras.layers.LSTM (long-short-term-memory)
- Preprocessing e.g., tf.keras.layers.Normalization
Optimisers e.g., tf.keras.optimizers.Adam
Losses e.g., tf.keras.losses.MeanSquaredError
Metrics e.g., tf.keras.metrics.Accuracy

You can always go lower level when required (e.g., custom objects).

Through Keras and TensorFlow you create models and layers using any of the following APIs:

	Sequential	Functional	Subclassing
Data structure	Graph: Linear stack of layers.	Graph: Non-linear DAG (directed acyclic graph) of layers.	Object-orientated. Write the forward pass (backward pass is automatic).
Shared layers and multiple inputs/outputs	No. Each layer has one input and one output.	Yes. Each layer can have multiple inputs and outputs.	Yes.
Main benefits and drawbacks	Simplest, (re)usability (easily saved), model checks to catch errors early, static.	Similar to seqential, but more flexible.	Maximum flexibility, no model checks, more complex, dynamic.
Show model graph?	Yes.	Yes.	Can add via the guidance here.

There are many libraries and extensions including:

TensorFlow Extended for deployment.
TensorFlow Lite for mobile and IoT (internet of things) devices.
TensorBoard for visualising the experiment results.
And many more (including projects, papers, and experiments).

PyTorch ¶

PyTorch is an end-to-end open source machine learning platform.

PyTorch has user-friendly APIs:

PyTorch Lightning
- High-level.
- Helps write boilerplate code, scale out to multiple devices, and other helpful things.
- Tutorials.
PyTorch Lightning-Flash
- Even higher-level.
- Abstractions above PyTorch Lightning for fast prototyping.

PyTorch (and its extensions) include a wide range of high-level objects including:

Models
Layers
- Activations e.g., torch.nn.Sigmoid
- Regularisers e.g., torch.nn.Dropout
- Convolutional e.g., torch.nn.Conv2d
- Recurrent e.g., torch.nn.LSTM
- Preprocessing e.g., torchvision.transforms.Normalize
Optimisers e.g., torch.optim.Adam
Losses e.g., torch.nn.MSELoss
Metrics e.g., torchmetrics.Accuracy

You can always go lower level when required (e.g., custom objects).

Similar to TensorFlow/Keras, you can create models and layers in PyTorch using either Sequential or Subclassing APIs (or in combination). These have similar features to the table above, where the Sequential API is simpler and the Subclassing API enables flexibility.

There are many libraries and extensions including:

TorchServe for deployment.
Pytorch Live for mobile and IoT devices.
And many more (including projects, papers, and experiments)

Example - Linear regression¶

Let’s start with a introductory example fitting a straight line to data.

Don’t worry too much about some of the details as we’ll cover them in later lesson.

For now, focus on the general workflow.

We’ll see how this in done in each of three key tools we cover here: scikit-learn, TensorFlow, and PyTorch.

Let’s create some (noisy) data to train on:

import numpy as np

def create_noisy_linear_data(num_points):
    x = np.arange(num_points)
    noise = np.random.normal(0, 1, num_points)
    y = 2 * x + noise
    # convert to 2D arrays
    x, y = x.reshape(-1, 1), y.reshape(-1, 1)
    return x, y

x_train, y_train = create_noisy_linear_data(10)

Caution

Input arrays to models needs to be 2 dimensional (2D) i.e., a column of rows.

For example, instead of one row:

>>> np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Convert this to a column of rows using .reshape(-1, 1):

>>> np.arange(10).reshape(-1, 1)
array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])

scikit-learn¶

First, let’s try with scikit-learn:

from sklearn import linear_model

model_sklearn = linear_model.LinearRegression()

When fit is called for Linear Regression, the loss that is trying to be minimised is the mean squared error between the predictions and the actual values.

This determines what parameters the model learns.

model_sklearn.fit(x_train, y_train)

LinearRegression()

The data was from the line y = 2x, so the gradient was 2.

Let’s see what the model estimated it to be:

model_sklearn.coef_[0]

array([1.83894562])

Pretty close, considering there was only 10 training data points.

TensorFlow¶

Now, for TensorFlow:

import tensorflow as tf

2022-05-05 15:43:28.352178: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-05-05 15:43:28.352209: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

Create the model (using the simpler sequential API).

Note, it’s helpful to name the layers in the model.

model_tf = tf.keras.Sequential(
    [
        tf.keras.Input(shape=(1,), name="inputs"),
        tf.keras.layers.Dense(units=1, name="outputs"),
    ],
    name="sequential",
)

2022-05-05 15:43:29.942409: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-05-05 15:43:29.942444: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-05-05 15:43:29.942468: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az90-458): /proc/driver/nvidia/version does not exist
2022-05-05 15:43:29.942753: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

For reference, here’s what this would have looked like using the functional and subclassing APIs:

inputs = tf.keras.Input(shape=(1,), name="inputs")
outputs = tf.keras.layers.Dense(units=1, name="outputs")(inputs)
model_tf_functional = tf.keras.Model(inputs, outputs, name="functional")

class MyModel(tf.keras.Model):
    def __init__(self, **kwargs):
        super(MyModel, self).__init__(**kwargs)  # handles standard arguments e.g., name
        self.outputs = tf.keras.layers.Dense(units=1, name="outputs")

    def call(self, inputs):  # have inputs as argument to call, rather than define
        x = self.outputs(inputs)
        return x


model_tf_subclassing = MyModel(name="subclassing")

You can now show the model summary.

Note, this only shows layers (not the Input object).

model_tf.summary()

Model: "sequential"

_________________________________________________________________

 Layer (type)                Output Shape              Param #

=================================================================

 outputs (Dense)             (None, 1)                 2

=================================================================

Total params: 2

Trainable params: 2

Non-trainable params: 0

_________________________________________________________________

You can also show the model graph:

tf.keras.utils.plot_model(model_tf, show_shapes=True)

Now, compile the model.

The keyword arguments to optimizer, loss, and metrics can either be strings (e.g., mean_squared_error) or TensorFlow objects (e.g., tf.keras.losses.MeanSquaredError())

model_tf.compile(
    optimizer="sgd",
    loss="mean_squared_error",
    metrics=["accuracy"],
)

And, train the model.

Epochs are how many passes over the whole training set.

model_tf.fit(
    x_train,
    y_train,
    epochs=10,
    verbose=False,  # print out the metrics per epoch
);

And, let’s see what this model though the gradient was:

model_tf.weights[0].numpy()

array([[1.916894]], dtype=float32)

PyTorch¶

import torch
from torch import nn
from torch.utils.data import DataLoader, TensorDataset

Create the dataset and dataloader:

x_train_tensor = torch.from_numpy(x_train).type(torch.float32)
y_train_tensor = torch.from_numpy(y_train).type(torch.float32)

ds_train = TensorDataset(x_train_tensor, y_train_tensor)

dataloader_train = DataLoader(ds_train)

Create the model (using the simpler sequential API):

model_torch = nn.Sequential(nn.Linear(in_features=1, out_features=1))

print(model_torch)

Sequential(
  (0): Linear(in_features=1, out_features=1, bias=True)
)

For reference, here’s what this would have looked like using the subclassing APIs:

class NeuralNetwork(nn.Module):
    def __init__(self):  # model definition
        super(NeuralNetwork, self).__init__()  # instantiate the nn.Module
        self.outputs = nn.Linear(in_features=1, out_features=1)

    def forward(self, x):  # the computations for the forward layer, not called directly
        logits = self.outputs(x)
        return logits


model_torch_subclassing = NeuralNetwork()
print(model_torch_subclassing)

NeuralNetwork(
  (outputs): Linear(in_features=1, out_features=1, bias=True)
)

Note

The backward propagation is calculated automatically, though you can do it manually if you like.

Define the loss and optimiser:

loss_function = nn.MSELoss()
optimiser = torch.optim.SGD(model_torch.parameters(), lr=1e-3)

Define a single training step:

def train(dataloader, model, loss_function, optimiser):
    size = len(dataloader.dataset)
    model.train()  # set the model in training mode, rather than in evaluation mode i.e., `model.eval()`

    # for each batch of data
    for batch, (X, y) in enumerate(dataloader):

        # step 1: make a prediction for these inputs
        prediction = model(X)

        # step 2: compute the loss for that prediction
        loss = loss_function(prediction, y)

        # step 3: first, clean the gradients
        optimiser.zero_grad()

        # step 4: backpropagate the gradients for that loss
        loss.backward()

        # step 5: update the parameters accordingly
        optimiser.step()

Note, that testing doesn’t need the gradients (i.e., steps 3-5).

Hence, the test function would look something like:

def test(dataloader, model, loss_function):
    size = len(dataloader.dataset)
    model.eval()  # set the model in evaluation mode
    ...
    
    with torch.no_grad():  # don't track gradients
        for batch, (X, y) in enumerate(dataloader):
            # step 1: make a prediction for these inputs
            prediction = model(X)

            # step 2: compute the loss for that prediction
            loss = loss_function(prediction, y)
            ...

We’ll see more examples of testing later.

Run the training step over multiple epochs:

NUM_EPOCHS = 5

for epoch in range(NUM_EPOCHS):
    train(dataloader_train, model_torch, loss_function, optimiser)

And, let’s see what this model thought the gradient was:

# to check parameter names
for name, parameter in model_torch.named_parameters():
    print(name)

0.weight
0.bias

model_torch[0].weight

Parameter containing:
tensor([[1.9530]], requires_grad=True)

Now, we can see how well these models fit a line to the data.

First, grab the predictions of each model (from the training data for plotting purposes).

y_pred_sklearn = model_sklearn.predict(x_train)

y_pred_tf = model_tf.predict(x_train)

y_pred_torch = model_torch(x_train_tensor).detach().numpy()

Then, show these lines on a plot:

import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt

colors = {"data": "#1b9e77", "sklearn": "#d95f02", "tf": "#7570b3", "torch": "#66a61e"}

def make_plot(ax, y_pred, label, title):
    ax.scatter(x_train, y_train, color=colors["data"])
    ax.plot(x_train, y_pred, color=colors[label], linewidth=3)
    ax.set_title(title)
    ax.set_ylim([0, 18])
    ax.set_xlim([0, 9])
    ax.set_facecolor("whitesmoke")

fig = plt.figure(1, figsize=(12, 4))
ax1, ax2, ax3 = fig.subplots(1, 3)

make_plot(ax1, y_pred_sklearn, "sklearn", "scikit-learn")
make_plot(ax2, y_pred_tf, "tf", "TensorFlow")
make_plot(ax3, y_pred_torch, "torch", "PyTorch")

plt.show()

They all did a good job of fitting a function to the data.

In other words, they found the association in the data.

However, this was a very simple example that probably didn’t require machine learning (let alone deep learning).

Though it demonstrates what they do.

Now, let’s look at something a little more suitable.

Example - Digit classification¶

Let’s train a model to recognise handwritten digits using the classic MNIST dataset.

This is a classification task.

scikit-learn¶

First, with scikit-learn:

from sklearn import datasets, linear_model, metrics, svm
from sklearn.model_selection import train_test_split

Load the data¶

digits = datasets.load_digits()

Take a look at the labelled data:

_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, digits.images, digits.target):
    ax.set_axis_off()
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title(f"Label: {label}")

Preprocess and split the data¶

def preprocess_data(digits):
    # the data comes as 2D 8x8 pixels
    # flatten the images to 1D 64 pixels
    n_samples = len(digits.images)
    data = digits.images.reshape((n_samples, -1))
    return n_samples, data

n_samples, data = preprocess_data(digits)

X_train, X_test, y_train, y_test = train_test_split(
    data, digits.target, test_size=0.5, shuffle=False
)

Create a model¶

Here, we will use a Support Vector Classifier.

Don’t worry about what gamma is for now (if you’re interested, read the documentation).

model = svm.SVC(gamma=0.001)

Fit the model to the training data¶

model.fit(X_train, y_train)

SVC(gamma=0.001)

Use the model to predict the test data¶

y_pred = model.predict(X_test)

Take a look at the predictions for these test digits:

_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, prediction in zip(axes, X_test, y_pred):
    ax.set_axis_off()
    image = image.reshape(8, 8)  # 1D 64 pixels to 2D 8*8 pixels for plotting
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title(f"Prediction: {prediction:.0f}")

Looking good. The predicted labels match the ground truth images.

How well did our model do overall?¶

overall_accuracy = metrics.accuracy_score(y_test, y_pred)
overall_accuracy

0.9688542825361512

97% accuracy is good.

Let’s do some quick error analysis using a confusion matrix.

This shows how well the classification model did for each category.

The predictions are on the x-axis and the true labels from the test data are on the y-axis.

A perfect score would be where the predictions always match the true labels (i.e., all values are on the diagonal line).

confusion_matrix = metrics.ConfusionMatrixDisplay.from_predictions(y_test, y_pred)
confusion_matrix.figure_.suptitle("Confusion Matrix")
plt.show()

We can see that the although the model did well, it struggled with 3’s by confusing them with 5’s, 7’s, and 8’s.

This points us in the direction of how we might improve the model.

We could also use cross-validation to find the variation in the training score:

from sklearn.model_selection import KFold, cross_val_score

cv = KFold(n_splits=5, shuffle=False)

test_scores = cross_val_score(model, X_train, y_train, cv=cv)

test_scores

array([0.93333333, 0.99444444, 0.90555556, 0.98882682, 0.95530726])

print(f"CV accuracy = {test_scores.mean():0.2f} (+/- {test_scores.std():0.2f})")

CV accuracy = 0.96 (+/- 0.03)

Save the model ¶

You can save models using joblib:

from joblib import dump

import os
from pathlib import Path

path_models = f"{os.getcwd()}/models"
Path(path_models).mkdir(parents=True, exist_ok=True)

You can then save the model using:

dump(model, f"{path_models}/mnist_model_sklearn.joblib")

You could then load this model back using:

from joblib import load

reloaded_model = load(f'{path_models}/mnist_model_sklearn.joblib')

TensorFlow¶

Now, with TensorFlow.

Check whether there are any GPUs (Graphical Processing Units) available.

Note, the device is the hardware that TensorFlow runs on (e.g., CPUs (Central Processing Units), GPUs).

print("Num GPUs Available: ", len(tf.config.list_physical_devices("GPU")))

Num GPUs Available:  0

Load and split the data¶

(train_images, train_labels), (
    test_images,
    test_labels,
) = tf.keras.datasets.mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz

   16384/11490434 [..............................] - ETA: 3s


   98304/11490434 [..............................] - ETA: 6s


  466944/11490434 [>.............................] - ETA: 2s


 2129920/11490434 [====>.........................] - ETA: 0s


 7913472/11490434 [===================>..........] - ETA: 0s


11493376/11490434 [==============================] - 0s 0us/step


11501568/11490434 [==============================] - 0s 0us/step

Take a look at some of the training data:

_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, train_images, train_labels):
    ax.set_axis_off()
    image = image.reshape(28, 28)  # 1D 784 pixels to 2D 28*28 pixels for plotting
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title(f"Label: {label}")

Create the model¶

Can use any of the sequential, functional, or subclassing APIs.

Let’s use the simpler Sequential API for now.

You could also use many .add() calls instead of the list.

Note

You could make the final layer a softmax (to output probabilities directly), though this is discouraged for numerical stability reasons.

Tip

It’s often useful to place pre-processing steps into the model pipeline too.

For example, here we flatten the 2D image to a 1D tensor and normalise the images to greyscale (i.e., convert the values to between 0 and 1).

model = tf.keras.Sequential(
    [
        tf.keras.Input(shape=(28, 28), name="inputs"),
        tf.keras.layers.Flatten(name="flatten"),
        tf.keras.layers.Rescaling(1.0 / 255, name="normalise"),
        tf.keras.layers.Dense(128, activation="relu", name="layer1"),
        tf.keras.layers.Dense(128, activation="relu", name="layer2"),
        tf.keras.layers.Dense(10, name="outputs"),  # 1 unit per class
    ]
)

model.summary()

Model: "sequential"

_________________________________________________________________

 Layer (type)                Output Shape              Param #

=================================================================

 flatten (Flatten)           (None, 784)               0

 normalise (Rescaling)       (None, 784)               0

 layer1 (Dense)              (None, 128)               100480

 layer2 (Dense)              (None, 128)               16512

 outputs (Dense)             (None, 10)                1290

=================================================================

Total params: 118,282

Trainable params: 118,282

Non-trainable params: 0

_________________________________________________________________

We can now also visualise the architecure:

tf.keras.utils.plot_model(model, show_shapes=True)

Compile the model¶

It’s useful to name the metrics, especially if there’s more than one.

Here, we’ll use the Adam optimiser, sparse categorical crossentropy loss, and a metric of accuracy.

model.compile(
    optimizer="adam",
    loss=tf.keras.losses.SparseCategoricalCrossentropy(
        from_logits=True
    ),  # ensure classifies using logits
    metrics=["accuracy"],
)

Fit the model to the training data¶

The fit() call returns a history object.

Note

The validation_split keyword argument can only be used for NumPy training data.

BATCH_SIZE = 32

history = model.fit(
    train_images,
    train_labels,
    epochs=2,
    batch_size=BATCH_SIZE,
    verbose=False,  # print the output from each epoch
    validation_split=0.2,  # automatically set apart a validation set: 0.2 means 20% for validation
);

The history.history dictionary then contains the loss and metrics per epoch:

history.history

{'loss': [0.2630394995212555, 0.10852228105068207],
 'accuracy': [0.9223333597183228, 0.9671041369438171],
 'val_loss': [0.14200642704963684, 0.09933607280254364],
 'val_accuracy': [0.9571666717529297, 0.9703333377838135]}

Predictions¶

Use the model for predictions with model.predict() (i.e., inference).

Models return logits or log-odds. If you’d like these be to probabilities, add a softmax layer:

probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])

y_pred = probability_model.predict(test_images)

Each prediction has a probability per category:

y_pred[0]

array([4.7735466e-07, 8.6988564e-08, 1.1313747e-05, 3.5345482e-04,
       3.0515390e-10, 4.6098885e-06, 1.6145316e-11, 9.9962521e-01,
       4.6569102e-07, 4.3706609e-06], dtype=float32)

The most likely category can be found by finding the maximum of these (using np.argmax):

np.argmax(y_pred[0])

So, the model thinks the first digit is a 7.

Let’s see if that’s right by plotting the first four test digits with their predictions:

_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, prediction in zip(axes, test_images, y_pred):
    ax.set_axis_off()
    image = tf.reshape(image, (28, 28))  # 1D 784 pixels to 2D 28*28 pixels for plotting
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title(f"Prediction: {np.argmax(prediction):.0f}")

Let’s now evaluate the model overall¶

test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy (R2): {test_acc}")

  1/313 [..............................] - ETA: 30s - loss: 0.0817 - accuracy: 0.9688


 38/313 [==>...........................] - ETA: 0s - loss: 0.1050 - accuracy: 0.9638 


 73/313 [=====>........................] - ETA: 0s - loss: 0.1276 - accuracy: 0.9585


110/313 [=========>....................] - ETA: 0s - loss: 0.1196 - accuracy: 0.9639


148/313 [=============>................] - ETA: 0s - loss: 0.1184 - accuracy: 0.9624


186/313 [================>.............] - ETA: 0s - loss: 0.1067 - accuracy: 0.9666


224/313 [====================>.........] - ETA: 0s - loss: 0.0999 - accuracy: 0.9689


262/313 [========================>.....] - ETA: 0s - loss: 0.0917 - accuracy: 0.9713


300/313 [===========================>..] - ETA: 0s - loss: 0.0864 - accuracy: 0.9729


313/313 [==============================] - 1s 1ms/step - loss: 0.0903 - accuracy: 0.9716

Test accuracy (R2): 0.9715999960899353

Similar to scikit-learn an overall test accuracy of 97% is good.

Note, that the training accuracy and validation accuracy were both 97% too.

As before, let’s have a look at a confusion matrix for some quick error analysis.

Note, TensorFlow does have its own confusion_matrix method. Though I’ll use the scikit-learn one here again as it has a nice plot feature.

confusion_matrix = metrics.ConfusionMatrixDisplay.from_predictions(
    test_labels, np.argmax(y_pred, axis=1)
)
confusion_matrix.figure_.suptitle("Confusion Matrix")
plt.show()

This model did well for most digits, though struggled a bit with 5’s.

Save the model ¶

A model includes:

Architecture
Weights (i.e., state)
Configuration (e.g., optimiser, loss, metrics)

You can save the whole or parts.

The different formats are:

TensorFlow SavedModel: single archive (recommended)
- Save: model.save() or tf.keras.models.save_model()
- Load: tf.keras.models.load_model()
- Note, Keras H5 was the older format.
Architecture only (JSON)
- Save: get_config() and tf.keras.models.model_to_json()
- Load: from_config() and tf.keras.models.model_from_json()
Weights only

model.save(f"{path_models}/model_tf_mnist")

2022-05-05 15:43:43.014912: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.

INFO:tensorflow:Assets written to: /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/models/model_tf_mnist/assets

!ls {path_models}/model_tf_mnist

assets	keras_metadata.pb  saved_model.pb  variables

Load the model¶

Reload the saved model and evaluate it on the test data.

new_model = tf.keras.models.load_model(f"{path_models}/model_tf_mnist")
new_model.summary()

Model: "sequential"

_________________________________________________________________

 Layer (type)                Output Shape              Param #

=================================================================

 flatten (Flatten)           (None, 784)               0

 normalise (Rescaling)       (None, 784)               0

 layer1 (Dense)              (None, 128)               100480

 layer2 (Dense)              (None, 128)               16512

 outputs (Dense)             (None, 10)                1290

=================================================================

Total params: 118,282

Trainable params: 118,282

Non-trainable params: 0

_________________________________________________________________

loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))

313/313 - 0s - loss: 0.0903 - accuracy: 0.9716 - 380ms/epoch - 1ms/step

Restored model, accuracy: 97.16%

PyTorch (Lightning)¶

Here, we’ll do a simple example using PyTorch Lightning.

This avoids creating some of the boilerplate code needed for pure PyTorch.

This will just include training for now (i.e., no validation or testing).

import os

import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from pytorch_lightning.callbacks.progress import TQDMProgressBar
from torch import nn
from torch.utils.data import DataLoader, random_split
from torchmetrics import Accuracy
from torchvision import transforms
from torchvision.datasets import MNIST

Note

torch.nn.functional contains functions for neural networks, while torch.nn defines them as modules.

BATCH_SIZE = 32
PATH_DATASETS = f"{os.getcwd()}/data"

Prepare the data¶

train_dataloader = DataLoader(
    MNIST(PATH_DATASETS, train=True, download=True, transform=transforms.ToTensor()),
    batch_size=BATCH_SIZE,
)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/train-images-idx3-ubyte.gz

Extracting /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/train-images-idx3-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/train-labels-idx1-ubyte.gz

Extracting /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/train-labels-idx1-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/t10k-images-idx3-ubyte.gz

Extracting /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/t10k-images-idx3-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/t10k-labels-idx1-ubyte.gz

Extracting /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/t10k-labels-idx1-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw

Create the model¶

This include the loss, optimiser, and training steps.

pl.LightningModule is a nn.Module with more features.

For more information on how to convert a PyTorch model to a PyTorch Lightning model see:

class MNISTModel(pl.LightningModule):
    def __init__(self):
        super(MNISTModel, self).__init__()
        self.layer1 = torch.nn.Linear(in_features=28 * 28, out_features=10)

    def forward(self, x):
        x = x.view(x.size(0), -1)  # flatten inputs
        x = self.layer1(x)  # pass inputs through hidden layer
        output = torch.relu(x)  # run activation function for layer
        return output

    def training_step(self, batch, batch_index):
        x, y = batch
        y_hat = self(x)  # predicted y output
        loss = F.cross_entropy(y_hat, y)
        tensorboard_logs = {"train_loss": loss}
        return {"loss": loss, "log": tensorboard_logs}

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

mnist_model = MNISTModel()
print(mnist_model)

MNISTModel(
  (layer1): Linear(in_features=784, out_features=10, bias=True)
)

Create the trainer¶

Warning

The progress bar can be too fast for Colab / Kaggle. If developing in these platforms, be sure to slow the refresh rate by increasing the value in: callbacks=TQDMProgressBar(refresh_rate=20).

trainer = pl.Trainer(gpus=0, callbacks=TQDMProgressBar(refresh_rate=20), max_epochs=5)

GPU available: False, used: False

TPU available: False, using: 0 TPU cores

IPU available: False, using: 0 IPUs

HPU available: False, using: 0 HPUs

Fit the model¶

if IN_COLAB:
    trainer.fit(mnist_model, train_dataloader)

We can see the loss reduce at the right of the progress bar.

You can change what is logged by editing the training_step method.

(Optional) Adding in validation and testing to the model creation¶

Note, DataLoaders are now incorporated into the model creation.

class MNISTModel(pl.LightningModule):
    def __init__(self):
        super(MNISTModel, self).__init__()
        self.layer1 = torch.nn.Linear(in_features=28 * 28, out_features=10)

    def forward(self, x):
        x = x.view(x.size(0), -1)  # flatten x
        x = self.layer1(x)  # pass inputs through hidden layer
        output = torch.relu(x)  # run activation function for layer
        return output

    def training_step(self, batch, batch_index):
        x, y = batch
        y_hat = self(x)  # predicted y output
        loss = F.cross_entropy(y_hat, y)
        tensorboard_logs = {"train_loss": loss}
        return {"loss": loss, "log": tensorboard_logs}

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

    # -------------------------
    # same as above up to here
    # new stuff below

    def validation_step(self, batch, batch_index):
        x, y = batch
        y_hat = self(x)
        val_loss = F.cross_entropy(y_hat, y)
        return {"val_loss": val_loss}

    def test_step(self, batch, batch_index):
        x, y = batch
        y_hat = self(x)
        test_loss = F.cross_entropy(y_hat, y)
        return {"test_loss": test_loss}

    def validation_epoch_end(self, outputs):  # hook for validation
        average_val_loss = torch.stack([x["val_loss"] for x in outputs]).mean()
        tensorboard_logs = {"val_loss": average_val_loss}
        return {"val_loss": average_val_loss, "log": tensorboard_logs}

    def test_epoch_end(self, outputs):  # hook for test
        average_test_loss = torch.stack([x["test_loss"] for x in outputs]).mean()
        logs = {"test_loss": average_test_loss}
        self.log_dict(logs)
        return {"test_loss": average_test_loss, "log": logs, "progress_bar": logs}

    # also added in the dataloaders below

    def train_dataloader(self):
        return DataLoader(
            MNIST(
                PATH_DATASETS,
                train=True,
                download=True,
                transform=transforms.ToTensor(),
            ),
            batch_size=BATCH_SIZE,
        )

    def val_dataloader(self):
        return DataLoader(
            MNIST(
                PATH_DATASETS,
                train=True,
                download=True,
                transform=transforms.ToTensor(),
            ),
            batch_size=BATCH_SIZE,
        )

    def test_dataloader(self):
        return DataLoader(
            MNIST(
                PATH_DATASETS,
                train=False,
                download=True,
                transform=transforms.ToTensor(),
            ),
            batch_size=BATCH_SIZE,
        )

mnist_model = MNISTModel()
print(mnist_model)

MNISTModel(
  (layer1): Linear(in_features=784, out_features=10, bias=True)
)

trainer = pl.Trainer(gpus=0, callbacks=TQDMProgressBar(refresh_rate=20), max_epochs=5)

GPU available: False, used: False

TPU available: False, using: 0 TPU cores

IPU available: False, using: 0 IPUs

HPU available: False, using: 0 HPUs

Note, the trainer only required the model as input, as the train_dataloader is part of the model now.

if IN_COLAB:
    trainer.fit(mnist_model)

Evaluation ¶

Now, testing the model is simply done by running:

if IN_COLAB:
    trainer.test(mnist_model)

Save the model¶

The model is saved automatically to lightning_logs/.

It is incrementally split over versions e.g., version_0.

This then saves checkpoints per epoch, overwriting with the latest epoch.

To save a model in PyTorch (without Lightning):

state_dict = model.state_dict()  # extract the parameters
torch.save(state_dict, "my_model_weights.pth")  # save the parameters

Load the model ¶

path_checkpoints = f"{os.getcwd()}/lightning_logs/version_0/checkpoints"
path_model = f"{path_checkpoints}/{os.listdir(path_checkpoints)[0]}"

reloaded_model = MNISTModel.load_from_checkpoint(path_model)

To load a model in PyTorch (without Lightning):

new_state_dict = torch.load("my_weights.pth")  # load the parameters
new_model = MNISTModel(..)  # instantiate a model
new_model.load_state_dict(new_state_dict)  # setup the new model with these parameters

Questions¶

Question 1

If you were looking to do classic machine learning, what tool is a good choice?

Question 2

If you were looking to do deep learning using a high-level API, what tools are a good choice?

Question 3

What are good reasons for choosing a high or low-level API?

Question 4

When creating a model, which API is simpler to use?

Sequential
Subclassing

Question 5

Put these general steps in order:

Compile the model
Preprocess the data
Test the model
Fit the model to the training data
Create the model
Download the data

Question 6

Which machine learning library is the best?

Solutions ¶

Key Points¶

Important

scikit-learn is great for classic machine learning problems.
TensorFlow and PyTorch are both great for deep learning problems.
Keras (high-level API for TensorFlow) and PyTorch Lightning (high-level API for PyTorch) have many high-level objects to help you create deep learning models.
You can use low-level APIs for any custom objects.
Explore your data before using it.
Check your model before fitting the training data to it.
Evaluate your model and analyse the errors it makes.

Further information¶

Good practices¶

Many decisions around model architecture are based on previous work, literature, and trial-and-error.
Debugging:
- Test each part individually, before testing the whole.
- Check the model summary and visualise the architecture.
- Use debug modes:
  - Add run_eagerly=True with the call to fit() in Keras.
  - Use Trainer(fast_dev_run=True) in PyTorch Lightning.
- Tips for Keras and PyTorch Lightning.
Offloading computations to a GPU may not be beneficial for small models.
Tips for optimising GPU performance from TensorFlow, NVIDIA.

Other options¶

There are many other tools for machine learning, including:

JAX
- A library for GPU accelerated NumPy with automatic differentiation.
Flax
- A neural network library and ecosystem for JAX that is designed for flexibility.
Haiku
- Built on top of JAX to provide simple, composable abstractions for machine learning research.
XGBoost
- Gradient boosting library.
Caffe
- Deep learning framework.
Sonnet
- High-level API for TensorFlow.
fastai
- High-level API for PyTorch.

SWD8: Introduction to Machine Learning

Tools

Contents

Tools¶

Overview¶

Example - Linear regression¶

scikit-learn¶

TensorFlow¶

PyTorch¶

Example - Digit classification¶

scikit-learn¶

Load the data¶

Preprocess and split the data¶

Create a model¶

Fit the model to the training data¶

Use the model to predict the test data¶

How well did our model do overall?¶

TensorFlow¶

Load and split the data¶

Create the model¶

Compile the model¶

Fit the model to the training data¶

Predictions¶

Let’s now evaluate the model overall¶

Load the model¶

Prepare the data¶

Create the model¶

Create the trainer¶

Fit the model¶

(Optional) Adding in validation and testing to the model creation¶

Save the model¶

Questions¶

Key Points¶

Further information¶

Good practices¶

Other options¶

Resources¶