Basic quickstart for Deep Learning using PyTorch (from official documentation: https://pytorch.org/)
PyTorch has two primitives to work with data: torch.utils.data.DataLoader and torch.utils.data.Dataset. Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset.
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt
The torchvision.datasets
module contains Dataset objects for many real-world vision data like CIFAR, COCO. We use the FashionMNIST dataset for this example. Every TorchVision Dataset
includes two arguments: transform
and target_transform
to modify the samples and labels respectively.
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
root="data", #root is the path where you want to store the Dataset
train=True,
download=True,
transform=ToTensor(),
)
# Download test data from open datasets.
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor(),
)
We pass the Dataset
as an argument to DataLoader
. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.
batch_size = 64
# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=batch_size, shuffle=True)
for X, y in test_dataloader:
print("Shape of X [N for batch size, C for channel, H for height, W for width]: ", X.shape)
print("Shape of y: ", y.shape, y.dtype)
break
NB: next(iter(train_dataloader))
renvoie un tuple, avec les X par batch_size puis y les labels
To define a neural network in PyTorch, we create a class that inherits from nn.Module
. We define the layers of the network in the __init__
function and specify how data will pass through the network in the forward
function. To accelerate operations in the neural network, we move it to the GPU if available.
The size of the input is the size of the picture (here 28*28)
The size of the output is the number of different classes (here 10 classes)
nn.Flatten
: We initialize the nn.Flatten
layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values ( the minibatch dimension (at dim=0) is maintained).
nn.Linear
: The linear
layer is a module that applies a linear transformation on the input using its stored weights and biases.
nn.ReLu
: Non-linear activations are what create the complex mappings between the model’s inputs and outputs. They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena.
# Get cpu or gpu device for training
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))
# Define model
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
nn.ReLU()
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device) # convert to GPU format
print(model)
To train a model, we need a loss function and an optimizer.
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model’s parameters.
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device) # Convert to GPU
# Reinitialize optimizer grad before running the backward pass because, by default, gradients are not overwritten when .backward() is called.
optimizer.zero_grad()
# Compute prediction error
pred = model(X) # In the forward pass you’ll compute the predicted y by passing x to the model
loss = loss_fn(pred, y) # After that, compute the loss
# Backpropagation
loss.backward()
optimizer.step() # step() updates the parameters
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
We also check the model’s performance against the test dataset to ensure it is learning.
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad(): # no_grad() because it's a test function, no need to store the gradient to backpropagate
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
test(test_dataloader, model, loss_fn)
print("Done!")
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")
model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))
This model can now be used to make predictions.
classes = [
"T-shirt/top",
"Trouser",
"Pullover",
"Dress",
"Coat",
"Sandal",
"Shirt",
"Sneaker",
"Bag",
"Ankle boot",
]
model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
pred = model(x)
predicted, actual = classes[pred[0].argmax(0)], classes[y]
print(f'Predicted: "{predicted}", Actual: "{actual}"')
- Directly from data:
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
- From a NumPy array
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
- From another tensor
Get a random number from a Tensor:
random_idx = torch.randint(low=0, high=10, size=(1,)) # size of randint() should be a tuple
By default, tensors are created on the CPU. We need to explicitly move tensors to the GPU using .to
method (after checking for GPU availability). Keep in mind that copying large tensors across devices can be expensive in terms of time and memory!
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu" # "cuda" for GPU, otherwise CPU
tensor = tensor.to(device)
Join tensors using torch.cat
to concatenatet a sequence of tensors. Alternative: torch.stack
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)
Single-element tensors
If you have a one-element tensor, for example by aggregating all values of a tensor into one value, you can convert it to a Python numerical value using item()
:
agg = tensor.sum()
agg_item = agg.item()
PyTorch provides two data primitives: torch.utils.data.DataLoader
and torch.utils.data.Dataset
that allow you to use pre-loaded datasets as well as your own data. Dataset
stores the samples and their corresponding labels, and DataLoader
wraps an iterable around the Dataset to enable easy access to the samples.
We load the FashionMNIST Dataset with the following parameters:
root
is the path where the train/test data is storedtrain
specifies training or test dataset (True
orFalse
)download=True
downloads the data from the internet if it’s not available at roottransform
andtarget_transform
specify the feature and label transformations
We can index Datasets
manually like a list: training_data[index]
. We use matplotlib
to visualize some samples in our training data.
Don't forget that training_data[index]
is a tuple with (img,label) = training_data[index]
rand_idx = 4534
img, label = training_data[rand_idx]
label_title = labels_map[label]
# plot figure
plt.imshow(img.squeeze())
plt.title(label_title)
labels_map = {
0: "T-Shirt",
1: "Trouser",
2: "Pullover",
3: "Dress",
4: "Coat",
5: "Sandal",
6: "Shirt",
7: "Sneaker",
8: "Bag",
9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
sample_idx = torch.randint(len(training_data), size=(1,)).item() # size takes a tuple
img, label = training_data[sample_idx]
figure.add_subplot(rows, cols, i)
plt.title(labels_map[label])
plt.axis("off")
plt.imshow(img.squeeze(), cmap="gray") # need to squeeze the image
plt.show()
The Dataset
retrieves our dataset’s features and labels one sample at a time. While training a model, we typically want to pass samples in “minibatches”, reshuffle the data at every epoch to reduce model overfitting, and use Python’s multiprocessing
to speed up data retrieval.
DataLoader
is an iterable that abstracts this complexity for us in an easy API.
from torch.utils.data import DataLoader
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=batch_size, shuffle=True)
We have loaded that dataset into the DataLoader
and can iterate through the dataset as needed. Each iteration below returns a batch of train_features
and train_labels
(containing batch_size=64
features and labels respectively). Because we specified shuffle=True
, after we iterate over all batches the data is shuffled (for finer-grained control over the data loading order, take a look at Samplers).
# Display image and label
train_features, train_labels = next(iter(train_dataloader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
img = train_features[0].squeeze()
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.show()
print(f"Label: {label}")
We use transforms to perform some manipulation of the data and make it suitable for training.
All TorchVision datasets have two parameters -transform
to modify the features and target_transform
to modify the labels
For instance, the FashionMNIST features are in PIL Image format, and the labels are integers. For training, we need the features as normalized tensors, and the labels as one-hot encoded tensors. To make these transformations, we use ToTensor
and lambda
(lambda is used to transform a int label to one-hot Tensor)
import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
ds = datasets.FashionMNIST(
root="data",
train=True, # get training data
download=True,
transform=ToTensor(), # ToTensor converts a PIL image or NumPy ndarray into a `FloatTensor` and scales the image’s pixel intensity values in the range [0., 1.]
target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(dim=0, torch.tensor(y), value=1)) # transform int labels to one-hot encoded tensors. scatter_ assigns a value=1 on the index as given by the label y
)
The torch.nn
namespace provides all the building blocks you need to build your own neural network. Every module in PyTorch subclasses the nn.Module
. A neural network is a module itself that consists of other modules (layers).
# import libraries
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
# Get device for training: use hardware accelerator like the GPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using {device} device')
We define our neural network by subclassing nn.Module
, and initialize the neural network layers in init (the architecture of the NN is in the init). Every nn.Module
subclass implements the operations on input data in the forward
method.
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
Then, we create an instance of NeuralNetwork
, and move it to the device
, and print its structure.
model = NeuralNetwork().to(device)
print(model)
Then, calling the model on the input returns a 10-dimensional tensor with raw predicted values for each class. We get the prediction probabilities by passing it through an instance of the nn.Softmax module
.
nn.Softmax
: The last linear layer of the neural network returns logits - raw values in [-infty, infty] - which are passed to the nn.Softmax module. The logits are scaled to values [0, 1] representing the model’s predicted probabilities for each class. dim
parameter indicates the dimension along which the values must sum to 1 (then apply argmax to find the class with the highest probability).
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")
When training neural networks, the most frequently used algorithm is back propagation. In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter.
To compute those gradients, PyTorch has a built-in differentiation engine called torch.autograd
. It supports automatic computation of gradient for any computational graph.