Tools
Contents
Tools¶
# if you're using colab, then install the required modules
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
%pip install --quiet --upgrade pytorch-lightning
Overview¶
There is huge variety of machine learning and deep learning tools.
In this course, we’ll focus on:
The tool you choose depends many considerations, for example:
Your research problem
Model availability (e.g., pre-trained, state-of-the-art)
Ecosystem (e.g., compatibility with other tools)
Personal preferences
Deployment (e.g., hardware)
There are many discussions on the different choices e.g., 1, 2.
scikit-learn¶
Scikit-learn has a wide range of simple and efficient classic machine learning tools.
There are ones for:
-
A set of methods where the output is a linear combination of the inputs.
For example, fitting a straight line to the data using Linear Regression (also known as ordinary least squares).
-
Find a (pre-defined) number of training samples closest in distance to the new point, and predict the label from these.
The number of samples can be defined in different ways.
There are various measures of distance.
For example, classifying labels based on their closeness to other samples in Nearest Neighbor Classification.
Support Vector Machines (examples)
Place a decision function (i.e., the support vector) between data points to classify, regress, or find outliers.
For example, find the two hardest to categorise samples and place a decision boundary between them in Support Vector Classification.
-
Predict the value of a target variable by learning simple decision rules inferred from the data features.
Many decisions are grouped together into a tree.
For example, many decision trees together in an ensemble is a Random Forest.
And many more.
TensorFlow¶
Tensorflow is an end-to-end open source machine learning platform.
TensorFlow has a user-friendly, high-level API (Application Programming Interface) called Keras.
Keras includes a wide range of high-level objects (tutorials) including:
-
Activations e.g.,
tf.keras.activations.sigmoid
Regularisers e.g.,
tf.keras.regularizers.l2
Convolutional e.g.,
tf.keras.layers.Conv2D
Recurrent e.g.,
tf.keras.layers.LSTM
(long-short-term-memory)Preprocessing e.g.,
tf.keras.layers.Normalization
Optimisers e.g.,
tf.keras.optimizers.Adam
Losses e.g.,
tf.keras.losses.MeanSquaredError
Metrics e.g.,
tf.keras.metrics.Accuracy
You can always go lower level when required (e.g., custom objects).
Through Keras and TensorFlow you create models and layers using any of the following APIs:
Data structure |
Graph: Linear stack of layers. |
Graph: Non-linear DAG (directed acyclic graph) of layers. |
Object-orientated. Write the forward pass (backward pass is automatic). |
Shared layers and multiple inputs/outputs |
No. Each layer has one input and one output. |
Yes. Each layer can have multiple inputs and outputs. |
Yes. |
Main benefits and drawbacks |
Simplest, (re)usability (easily saved), model checks to catch errors early, static. |
Similar to seqential, but more flexible. |
Maximum flexibility, no model checks, more complex, dynamic. |
Show model graph? |
Yes. |
Yes. |
Can add via the guidance here. |
There are many libraries and extensions including:
TensorFlow Extended for deployment.
TensorFlow Lite for mobile and IoT (internet of things) devices.
TensorBoard for visualising the experiment results.
And many more (including projects, papers, and experiments).
PyTorch¶
PyTorch is an end-to-end open source machine learning platform.
PyTorch has user-friendly APIs:
-
High-level.
Helps write boilerplate code, scale out to multiple devices, and other helpful things.
-
Even higher-level.
Abstractions above PyTorch Lightning for fast prototyping.
PyTorch (and its extensions) include a wide range of high-level objects including:
-
Activations e.g.,
torch.nn.Sigmoid
Regularisers e.g.,
torch.nn.Dropout
Convolutional e.g.,
torch.nn.Conv2d
Recurrent e.g.,
torch.nn.LSTM
Preprocessing e.g.,
torchvision.transforms.Normalize
Optimisers e.g.,
torch.optim.Adam
Losses e.g.,
torch.nn.MSELoss
Metrics e.g.,
torchmetrics.Accuracy
You can always go lower level when required (e.g., custom objects).
Similar to TensorFlow/Keras, you can create models and layers in PyTorch using either Sequential or Subclassing APIs (or in combination). These have similar features to the table above, where the Sequential API is simpler and the Subclassing API enables flexibility.
There are many libraries and extensions including:
TorchServe for deployment.
Pytorch Live for mobile and IoT devices.
Example - Linear regression¶
Let’s start with a introductory example fitting a straight line to data.
Don’t worry too much about some of the details as we’ll cover them in later lesson.
For now, focus on the general workflow.
We’ll see how this in done in each of three key tools we cover here: scikit-learn, TensorFlow, and PyTorch.
Let’s create some (noisy) data to train on:
import numpy as np
def create_noisy_linear_data(num_points):
x = np.arange(num_points)
noise = np.random.normal(0, 1, num_points)
y = 2 * x + noise
# convert to 2D arrays
x, y = x.reshape(-1, 1), y.reshape(-1, 1)
return x, y
x_train, y_train = create_noisy_linear_data(10)
Caution
Input arrays to models needs to be 2 dimensional (2D) i.e., a column of rows.
For example, instead of one row:
>>> np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Convert this to a column of rows using .reshape(-1, 1)
:
>>> np.arange(10).reshape(-1, 1)
array([[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9]])
scikit-learn¶
First, let’s try with scikit-learn:
from sklearn import linear_model
model_sklearn = linear_model.LinearRegression()
When fit is called for Linear Regression, the loss that is trying to be minimised is the mean squared error between the predictions and the actual values.
This determines what parameters the model learns.
model_sklearn.fit(x_train, y_train)
LinearRegression()
The data was from the line y = 2x
, so the gradient was 2.
Let’s see what the model estimated it to be:
model_sklearn.coef_[0]
array([1.83894562])
Pretty close, considering there was only 10 training data points.
TensorFlow¶
Now, for TensorFlow:
import tensorflow as tf
2022-05-05 15:43:28.352178: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-05-05 15:43:28.352209: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Create the model (using the simpler sequential API).
Note, it’s helpful to name the layers in the model.
model_tf = tf.keras.Sequential(
[
tf.keras.Input(shape=(1,), name="inputs"),
tf.keras.layers.Dense(units=1, name="outputs"),
],
name="sequential",
)
2022-05-05 15:43:29.942409: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-05-05 15:43:29.942444: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-05-05 15:43:29.942468: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az90-458): /proc/driver/nvidia/version does not exist
2022-05-05 15:43:29.942753: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
For reference, here’s what this would have looked like using the functional and subclassing APIs:
inputs = tf.keras.Input(shape=(1,), name="inputs")
outputs = tf.keras.layers.Dense(units=1, name="outputs")(inputs)
model_tf_functional = tf.keras.Model(inputs, outputs, name="functional")
class MyModel(tf.keras.Model):
def __init__(self, **kwargs):
super(MyModel, self).__init__(**kwargs) # handles standard arguments e.g., name
self.outputs = tf.keras.layers.Dense(units=1, name="outputs")
def call(self, inputs): # have inputs as argument to call, rather than define
x = self.outputs(inputs)
return x
model_tf_subclassing = MyModel(name="subclassing")
You can now show the model summary.
Note, this only shows layers (not the Input
object).
model_tf.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
outputs (Dense) (None, 1) 2
=================================================================
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________
You can also show the model graph:
tf.keras.utils.plot_model(model_tf, show_shapes=True)
Now, compile the model.
The keyword arguments to optimizer
, loss
, and metrics
can either be strings (e.g., mean_squared_error
) or TensorFlow objects (e.g., tf.keras.losses.MeanSquaredError()
)
model_tf.compile(
optimizer="sgd",
loss="mean_squared_error",
metrics=["accuracy"],
)
And, train the model.
Epochs are how many passes over the whole training set.
model_tf.fit(
x_train,
y_train,
epochs=10,
verbose=False, # print out the metrics per epoch
);
And, let’s see what this model though the gradient was:
model_tf.weights[0].numpy()
array([[1.916894]], dtype=float32)
PyTorch¶
import torch
from torch import nn
from torch.utils.data import DataLoader, TensorDataset
Create the dataset and dataloader:
x_train_tensor = torch.from_numpy(x_train).type(torch.float32)
y_train_tensor = torch.from_numpy(y_train).type(torch.float32)
ds_train = TensorDataset(x_train_tensor, y_train_tensor)
dataloader_train = DataLoader(ds_train)
Create the model (using the simpler sequential API):
model_torch = nn.Sequential(nn.Linear(in_features=1, out_features=1))
print(model_torch)
Sequential(
(0): Linear(in_features=1, out_features=1, bias=True)
)
For reference, here’s what this would have looked like using the subclassing APIs:
class NeuralNetwork(nn.Module):
def __init__(self): # model definition
super(NeuralNetwork, self).__init__() # instantiate the nn.Module
self.outputs = nn.Linear(in_features=1, out_features=1)
def forward(self, x): # the computations for the forward layer, not called directly
logits = self.outputs(x)
return logits
model_torch_subclassing = NeuralNetwork()
print(model_torch_subclassing)
NeuralNetwork(
(outputs): Linear(in_features=1, out_features=1, bias=True)
)
Note
The backward propagation is calculated automatically, though you can do it manually if you like.
Define the loss and optimiser:
loss_function = nn.MSELoss()
optimiser = torch.optim.SGD(model_torch.parameters(), lr=1e-3)
Define a single training step:
def train(dataloader, model, loss_function, optimiser):
size = len(dataloader.dataset)
model.train() # set the model in training mode, rather than in evaluation mode i.e., `model.eval()`
# for each batch of data
for batch, (X, y) in enumerate(dataloader):
# step 1: make a prediction for these inputs
prediction = model(X)
# step 2: compute the loss for that prediction
loss = loss_function(prediction, y)
# step 3: first, clean the gradients
optimiser.zero_grad()
# step 4: backpropagate the gradients for that loss
loss.backward()
# step 5: update the parameters accordingly
optimiser.step()
Note, that testing doesn’t need the gradients (i.e., steps 3-5).
Hence, the test function would look something like:
def test(dataloader, model, loss_function):
size = len(dataloader.dataset)
model.eval() # set the model in evaluation mode
...
with torch.no_grad(): # don't track gradients
for batch, (X, y) in enumerate(dataloader):
# step 1: make a prediction for these inputs
prediction = model(X)
# step 2: compute the loss for that prediction
loss = loss_function(prediction, y)
...
We’ll see more examples of testing later.
Run the training step over multiple epochs:
NUM_EPOCHS = 5
for epoch in range(NUM_EPOCHS):
train(dataloader_train, model_torch, loss_function, optimiser)
And, let’s see what this model thought the gradient was:
# to check parameter names
for name, parameter in model_torch.named_parameters():
print(name)
0.weight
0.bias
model_torch[0].weight
Parameter containing:
tensor([[1.9530]], requires_grad=True)
Now, we can see how well these models fit a line to the data.
First, grab the predictions of each model (from the training data for plotting purposes).
y_pred_sklearn = model_sklearn.predict(x_train)
y_pred_tf = model_tf.predict(x_train)
y_pred_torch = model_torch(x_train_tensor).detach().numpy()
Then, show these lines on a plot:
import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt
colors = {"data": "#1b9e77", "sklearn": "#d95f02", "tf": "#7570b3", "torch": "#66a61e"}
def make_plot(ax, y_pred, label, title):
ax.scatter(x_train, y_train, color=colors["data"])
ax.plot(x_train, y_pred, color=colors[label], linewidth=3)
ax.set_title(title)
ax.set_ylim([0, 18])
ax.set_xlim([0, 9])
ax.set_facecolor("whitesmoke")
fig = plt.figure(1, figsize=(12, 4))
ax1, ax2, ax3 = fig.subplots(1, 3)
make_plot(ax1, y_pred_sklearn, "sklearn", "scikit-learn")
make_plot(ax2, y_pred_tf, "tf", "TensorFlow")
make_plot(ax3, y_pred_torch, "torch", "PyTorch")
plt.show()
They all did a good job of fitting a function to the data.
In other words, they found the association in the data.
However, this was a very simple example that probably didn’t require machine learning (let alone deep learning).
Though it demonstrates what they do.
Now, let’s look at something a little more suitable.
Example - Digit classification¶
Let’s train a model to recognise handwritten digits using the classic MNIST dataset.
This is a classification task.
scikit-learn¶
First, with scikit-learn:
from sklearn import datasets, linear_model, metrics, svm
from sklearn.model_selection import train_test_split
Load the data¶
digits = datasets.load_digits()
Take a look at the labelled data:
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, digits.images, digits.target):
ax.set_axis_off()
ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
ax.set_title(f"Label: {label}")
Preprocess and split the data¶
def preprocess_data(digits):
# the data comes as 2D 8x8 pixels
# flatten the images to 1D 64 pixels
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
return n_samples, data
n_samples, data = preprocess_data(digits)
X_train, X_test, y_train, y_test = train_test_split(
data, digits.target, test_size=0.5, shuffle=False
)
Create a model¶
Here, we will use a Support Vector Classifier.
Don’t worry about what gamma is for now (if you’re interested, read the documentation).
model = svm.SVC(gamma=0.001)
Fit the model to the training data¶
model.fit(X_train, y_train)
SVC(gamma=0.001)
Use the model to predict the test data¶
y_pred = model.predict(X_test)
Take a look at the predictions for these test digits:
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, prediction in zip(axes, X_test, y_pred):
ax.set_axis_off()
image = image.reshape(8, 8) # 1D 64 pixels to 2D 8*8 pixels for plotting
ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
ax.set_title(f"Prediction: {prediction:.0f}")
Looking good. The predicted labels match the ground truth images.
How well did our model do overall?¶
overall_accuracy = metrics.accuracy_score(y_test, y_pred)
overall_accuracy
0.9688542825361512
97% accuracy is good.
Let’s do some quick error analysis using a confusion matrix.
This shows how well the classification model did for each category.
The predictions are on the x-axis and the true labels from the test data are on the y-axis.
A perfect score would be where the predictions always match the true labels (i.e., all values are on the diagonal line).
confusion_matrix = metrics.ConfusionMatrixDisplay.from_predictions(y_test, y_pred)
confusion_matrix.figure_.suptitle("Confusion Matrix")
plt.show()
We can see that the although the model did well, it struggled with 3’s by confusing them with 5’s, 7’s, and 8’s.
This points us in the direction of how we might improve the model.
We could also use cross-validation to find the variation in the training score:
from sklearn.model_selection import KFold, cross_val_score
cv = KFold(n_splits=5, shuffle=False)
test_scores = cross_val_score(model, X_train, y_train, cv=cv)
test_scores
array([0.93333333, 0.99444444, 0.90555556, 0.98882682, 0.95530726])
print(f"CV accuracy = {test_scores.mean():0.2f} (+/- {test_scores.std():0.2f})")
CV accuracy = 0.96 (+/- 0.03)
Save the model¶
You can save models using joblib
:
from joblib import dump
import os
from pathlib import Path
path_models = f"{os.getcwd()}/models"
Path(path_models).mkdir(parents=True, exist_ok=True)
You can then save the model using:
dump(model, f"{path_models}/mnist_model_sklearn.joblib")
You could then load this model back using:
from joblib import load
reloaded_model = load(f'{path_models}/mnist_model_sklearn.joblib')
TensorFlow¶
Now, with TensorFlow.
Check whether there are any GPUs (Graphical Processing Units) available.
Note, the device is the hardware that TensorFlow runs on (e.g., CPUs (Central Processing Units), GPUs).
print("Num GPUs Available: ", len(tf.config.list_physical_devices("GPU")))
Num GPUs Available: 0
Load and split the data¶
(train_images, train_labels), (
test_images,
test_labels,
) = tf.keras.datasets.mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
16384/11490434 [..............................] - ETA: 3s
98304/11490434 [..............................] - ETA: 6s
466944/11490434 [>.............................] - ETA: 2s
2129920/11490434 [====>.........................] - ETA: 0s
7913472/11490434 [===================>..........] - ETA: 0s
11493376/11490434 [==============================] - 0s 0us/step
11501568/11490434 [==============================] - 0s 0us/step
Take a look at some of the training data:
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, train_images, train_labels):
ax.set_axis_off()
image = image.reshape(28, 28) # 1D 784 pixels to 2D 28*28 pixels for plotting
ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
ax.set_title(f"Label: {label}")
Create the model¶
Can use any of the sequential, functional, or subclassing APIs.
Let’s use the simpler Sequential API for now.
You could also use many .add()
calls instead of the list.
Note
You could make the final layer a softmax (to output probabilities directly), though this is discouraged for numerical stability reasons.
Tip
It’s often useful to place pre-processing steps into the model pipeline too.
For example, here we flatten the 2D image to a 1D tensor and normalise the images to greyscale (i.e., convert the values to between 0 and 1).
model = tf.keras.Sequential(
[
tf.keras.Input(shape=(28, 28), name="inputs"),
tf.keras.layers.Flatten(name="flatten"),
tf.keras.layers.Rescaling(1.0 / 255, name="normalise"),
tf.keras.layers.Dense(128, activation="relu", name="layer1"),
tf.keras.layers.Dense(128, activation="relu", name="layer2"),
tf.keras.layers.Dense(10, name="outputs"), # 1 unit per class
]
)
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
normalise (Rescaling) (None, 784) 0
layer1 (Dense) (None, 128) 100480
layer2 (Dense) (None, 128) 16512
outputs (Dense) (None, 10) 1290
=================================================================
Total params: 118,282
Trainable params: 118,282
Non-trainable params: 0
_________________________________________________________________
We can now also visualise the architecure:
tf.keras.utils.plot_model(model, show_shapes=True)
Compile the model¶
It’s useful to name the metrics, especially if there’s more than one.
Here, we’ll use the Adam optimiser, sparse categorical crossentropy loss, and a metric of accuracy.
model.compile(
optimizer="adam",
loss=tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True
), # ensure classifies using logits
metrics=["accuracy"],
)
Fit the model to the training data¶
The fit()
call returns a history
object.
Note
The validation_split
keyword argument can only be used for NumPy training data.
BATCH_SIZE = 32
history = model.fit(
train_images,
train_labels,
epochs=2,
batch_size=BATCH_SIZE,
verbose=False, # print the output from each epoch
validation_split=0.2, # automatically set apart a validation set: 0.2 means 20% for validation
);
The history.history
dictionary then contains the loss and metrics per epoch:
history.history
{'loss': [0.2630394995212555, 0.10852228105068207],
'accuracy': [0.9223333597183228, 0.9671041369438171],
'val_loss': [0.14200642704963684, 0.09933607280254364],
'val_accuracy': [0.9571666717529297, 0.9703333377838135]}
Predictions¶
Use the model for predictions with model.predict()
(i.e., inference).
Models return logits or log-odds. If you’d like these be to probabilities, add a softmax layer:
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])
y_pred = probability_model.predict(test_images)
Each prediction has a probability per category:
y_pred[0]
array([4.7735466e-07, 8.6988564e-08, 1.1313747e-05, 3.5345482e-04,
3.0515390e-10, 4.6098885e-06, 1.6145316e-11, 9.9962521e-01,
4.6569102e-07, 4.3706609e-06], dtype=float32)
The most likely category can be found by finding the maximum of these (using np.argmax
):
np.argmax(y_pred[0])
7
So, the model thinks the first digit is a 7.
Let’s see if that’s right by plotting the first four test digits with their predictions:
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, prediction in zip(axes, test_images, y_pred):
ax.set_axis_off()
image = tf.reshape(image, (28, 28)) # 1D 784 pixels to 2D 28*28 pixels for plotting
ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
ax.set_title(f"Prediction: {np.argmax(prediction):.0f}")
Let’s now evaluate the model overall¶
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy (R2): {test_acc}")
1/313 [..............................] - ETA: 30s - loss: 0.0817 - accuracy: 0.9688
38/313 [==>...........................] - ETA: 0s - loss: 0.1050 - accuracy: 0.9638
73/313 [=====>........................] - ETA: 0s - loss: 0.1276 - accuracy: 0.9585
110/313 [=========>....................] - ETA: 0s - loss: 0.1196 - accuracy: 0.9639
148/313 [=============>................] - ETA: 0s - loss: 0.1184 - accuracy: 0.9624
186/313 [================>.............] - ETA: 0s - loss: 0.1067 - accuracy: 0.9666
224/313 [====================>.........] - ETA: 0s - loss: 0.0999 - accuracy: 0.9689
262/313 [========================>.....] - ETA: 0s - loss: 0.0917 - accuracy: 0.9713
300/313 [===========================>..] - ETA: 0s - loss: 0.0864 - accuracy: 0.9729
313/313 [==============================] - 1s 1ms/step - loss: 0.0903 - accuracy: 0.9716
Test accuracy (R2): 0.9715999960899353
Similar to scikit-learn an overall test accuracy of 97% is good.
Note, that the training accuracy and validation accuracy were both 97% too.
As before, let’s have a look at a confusion matrix for some quick error analysis.
Note, TensorFlow does have its own confusion_matrix
method. Though I’ll use the scikit-learn one here again as it has a nice plot feature.
confusion_matrix = metrics.ConfusionMatrixDisplay.from_predictions(
test_labels, np.argmax(y_pred, axis=1)
)
confusion_matrix.figure_.suptitle("Confusion Matrix")
plt.show()
This model did well for most digits, though struggled a bit with 5’s.
Save the model¶
A model includes:
Architecture
Weights (i.e., state)
Configuration (e.g., optimiser, loss, metrics)
You can save the whole or parts.
The different formats are:
TensorFlow SavedModel: single archive (recommended)
Save:
model.save()
ortf.keras.models.save_model()
Load:
tf.keras.models.load_model()
Note, Keras H5 was the older format.
Architecture only (JSON)
Save:
get_config()
andtf.keras.models.model_to_json()
Load:
from_config()
andtf.keras.models.model_from_json()
Weights only
model.save(f"{path_models}/model_tf_mnist")
2022-05-05 15:43:43.014912: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/models/model_tf_mnist/assets
!ls {path_models}/model_tf_mnist
assets keras_metadata.pb saved_model.pb variables
Load the model¶
Reload the saved model and evaluate it on the test data.
new_model = tf.keras.models.load_model(f"{path_models}/model_tf_mnist")
new_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
normalise (Rescaling) (None, 784) 0
layer1 (Dense) (None, 128) 100480
layer2 (Dense) (None, 128) 16512
outputs (Dense) (None, 10) 1290
=================================================================
Total params: 118,282
Trainable params: 118,282
Non-trainable params: 0
_________________________________________________________________
loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))
313/313 - 0s - loss: 0.0903 - accuracy: 0.9716 - 380ms/epoch - 1ms/step
Restored model, accuracy: 97.16%
PyTorch (Lightning)¶
Here, we’ll do a simple example using PyTorch Lightning.
This avoids creating some of the boilerplate code needed for pure PyTorch.
This will just include training for now (i.e., no validation or testing).
import os
import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from pytorch_lightning.callbacks.progress import TQDMProgressBar
from torch import nn
from torch.utils.data import DataLoader, random_split
from torchmetrics import Accuracy
from torchvision import transforms
from torchvision.datasets import MNIST
Note
torch.nn.functional
contains functions for neural networks, while torch.nn
defines them as modules.
BATCH_SIZE = 32
PATH_DATASETS = f"{os.getcwd()}/data"
Prepare the data¶
train_dataloader = DataLoader(
MNIST(PATH_DATASETS, train=True, download=True, transform=transforms.ToTensor()),
batch_size=BATCH_SIZE,
)
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/train-images-idx3-ubyte.gz
Extracting /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/train-images-idx3-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/train-labels-idx1-ubyte.gz
Extracting /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/train-labels-idx1-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/t10k-images-idx3-ubyte.gz
Extracting /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/t10k-images-idx3-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw/t10k-labels-idx1-ubyte.gz to /home/runner/work/swd8_intro_ml/swd8_intro_ml/docs/data/MNIST/raw
Create the model¶
This include the loss, optimiser, and training steps.
pl.LightningModule
is a nn.Module
with more features.
For more information on how to convert a PyTorch model to a PyTorch Lightning model see:
class MNISTModel(pl.LightningModule):
def __init__(self):
super(MNISTModel, self).__init__()
self.layer1 = torch.nn.Linear(in_features=28 * 28, out_features=10)
def forward(self, x):
x = x.view(x.size(0), -1) # flatten inputs
x = self.layer1(x) # pass inputs through hidden layer
output = torch.relu(x) # run activation function for layer
return output
def training_step(self, batch, batch_index):
x, y = batch
y_hat = self(x) # predicted y output
loss = F.cross_entropy(y_hat, y)
tensorboard_logs = {"train_loss": loss}
return {"loss": loss, "log": tensorboard_logs}
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.02)
mnist_model = MNISTModel()
print(mnist_model)
MNISTModel(
(layer1): Linear(in_features=784, out_features=10, bias=True)
)
Create the trainer¶
Warning
The progress bar can be too fast for Colab / Kaggle. If developing in these platforms, be sure to slow the refresh rate by increasing the value in: callbacks=TQDMProgressBar(refresh_rate=20)
.
trainer = pl.Trainer(gpus=0, callbacks=TQDMProgressBar(refresh_rate=20), max_epochs=5)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Fit the model¶
if IN_COLAB:
trainer.fit(mnist_model, train_dataloader)
We can see the loss reduce at the right of the progress bar.
You can change what is logged by editing the training_step
method.
(Optional) Adding in validation and testing to the model creation¶
Note, DataLoaders are now incorporated into the model creation.
class MNISTModel(pl.LightningModule):
def __init__(self):
super(MNISTModel, self).__init__()
self.layer1 = torch.nn.Linear(in_features=28 * 28, out_features=10)
def forward(self, x):
x = x.view(x.size(0), -1) # flatten x
x = self.layer1(x) # pass inputs through hidden layer
output = torch.relu(x) # run activation function for layer
return output
def training_step(self, batch, batch_index):
x, y = batch
y_hat = self(x) # predicted y output
loss = F.cross_entropy(y_hat, y)
tensorboard_logs = {"train_loss": loss}
return {"loss": loss, "log": tensorboard_logs}
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.02)
# -------------------------
# same as above up to here
# new stuff below
def validation_step(self, batch, batch_index):
x, y = batch
y_hat = self(x)
val_loss = F.cross_entropy(y_hat, y)
return {"val_loss": val_loss}
def test_step(self, batch, batch_index):
x, y = batch
y_hat = self(x)
test_loss = F.cross_entropy(y_hat, y)
return {"test_loss": test_loss}
def validation_epoch_end(self, outputs): # hook for validation
average_val_loss = torch.stack([x["val_loss"] for x in outputs]).mean()
tensorboard_logs = {"val_loss": average_val_loss}
return {"val_loss": average_val_loss, "log": tensorboard_logs}
def test_epoch_end(self, outputs): # hook for test
average_test_loss = torch.stack([x["test_loss"] for x in outputs]).mean()
logs = {"test_loss": average_test_loss}
self.log_dict(logs)
return {"test_loss": average_test_loss, "log": logs, "progress_bar": logs}
# also added in the dataloaders below
def train_dataloader(self):
return DataLoader(
MNIST(
PATH_DATASETS,
train=True,
download=True,
transform=transforms.ToTensor(),
),
batch_size=BATCH_SIZE,
)
def val_dataloader(self):
return DataLoader(
MNIST(
PATH_DATASETS,
train=True,
download=True,
transform=transforms.ToTensor(),
),
batch_size=BATCH_SIZE,
)
def test_dataloader(self):
return DataLoader(
MNIST(
PATH_DATASETS,
train=False,
download=True,
transform=transforms.ToTensor(),
),
batch_size=BATCH_SIZE,
)
mnist_model = MNISTModel()
print(mnist_model)
MNISTModel(
(layer1): Linear(in_features=784, out_features=10, bias=True)
)
trainer = pl.Trainer(gpus=0, callbacks=TQDMProgressBar(refresh_rate=20), max_epochs=5)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Note, the trainer only required the model as input, as the train_dataloader
is part of the model now.
if IN_COLAB:
trainer.fit(mnist_model)
Evaluation¶
Now, testing the model is simply done by running:
if IN_COLAB:
trainer.test(mnist_model)
Save the model¶
The model is saved automatically to lightning_logs/
.
It is incrementally split over versions e.g., version_0
.
This then saves checkpoints per epoch, overwriting with the latest epoch.
To save a model in PyTorch (without Lightning):
state_dict = model.state_dict() # extract the parameters
torch.save(state_dict, "my_model_weights.pth") # save the parameters
Load the model¶
path_checkpoints = f"{os.getcwd()}/lightning_logs/version_0/checkpoints"
path_model = f"{path_checkpoints}/{os.listdir(path_checkpoints)[0]}"
reloaded_model = MNISTModel.load_from_checkpoint(path_model)
To load a model in PyTorch (without Lightning):
new_state_dict = torch.load("my_weights.pth") # load the parameters
new_model = MNISTModel(..) # instantiate a model
new_model.load_state_dict(new_state_dict) # setup the new model with these parameters
Questions¶
Question 1
If you were looking to do classic machine learning, what tool is a good choice?
Question 2
If you were looking to do deep learning using a high-level API, what tools are a good choice?
Question 3
What are good reasons for choosing a high or low-level API?
Question 4
When creating a model, which API is simpler to use?
Sequential
Subclassing
Question 5
Put these general steps in order:
Compile the model
Preprocess the data
Test the model
Fit the model to the training data
Create the model
Download the data
Question 6
Which machine learning library is the best?
Key Points¶
Important
scikit-learn is great for classic machine learning problems.
TensorFlow and PyTorch are both great for deep learning problems.
Keras (high-level API for TensorFlow) and PyTorch Lightning (high-level API for PyTorch) have many high-level objects to help you create deep learning models.
You can use low-level APIs for any custom objects.
Explore your data before using it.
Check your model before fitting the training data to it.
Evaluate your model and analyse the errors it makes.
Further information¶
Good practices¶
Many decisions around model architecture are based on previous work, literature, and trial-and-error.
Debugging:
Test each part individually, before testing the whole.
Check the model summary and visualise the architecture.
Use debug modes:
Add
run_eagerly=True
with the call tofit()
in Keras.Use
Trainer(fast_dev_run=True)
in PyTorch Lightning.
Tips for Keras and PyTorch Lightning.
Offloading computations to a GPU may not be beneficial for small models.
Tips for optimising GPU performance from TensorFlow, NVIDIA.
Other options¶
There are many other tools for machine learning, including:
-
A library for GPU accelerated NumPy with automatic differentiation.
-
A neural network library and ecosystem for JAX that is designed for flexibility.
-
Built on top of JAX to provide simple, composable abstractions for machine learning research.
-
Gradient boosting library.
-
Deep learning framework.
-
High-level API for TensorFlow.
-
High-level API for PyTorch.