Deep Learning with PyTorch

First contact with PyTorch for beginners

This post will introduce the reader to the basics features of PyTorch which enables us to implement Deep Learning models using Python language. The post doesn’t pretend to be a complete manual of PyTorch, it only introduces the minimum knowledge of PyTorch to start coding neural networks in PyTorch and we will be introducing new features as we need them throughout the series. Enjoy it!


The clear leaders in Deep Learning frameworks arena are now the Google-developed TensorFlow and the Facebook-developed PyTorch, and they are pulling away from the rest of the market in usage, share, and momentum.

Three years ago appeared the first version of PyTorch and without question, it is gaining great momentum. Initially incubated by Facebook, PyTorch rapidly developed a reputation from being an ideal flexible framework for rapid experimentation and prototyping gaining thousands of fans within the Deep Learning community. For instance, PhD students in my research team prefer to use PyTorch because it allows to them to write native looking Python code and still get all the benefits of a good framework like auto-differentiation and built-in optimization. This is the reason I decided to use PyTorch in this series.

Though PyTorch has gained momentum in the marketplace thanks to Facebook (and AWS), TensorFlow continues to be ahead in all aspects and is the most used for the industry right now. Yo can read this brief post “ TensorFlow vs PyTorch: The battle continues ” for a more detail about both environments.

Environment set up

I suggest using the Colaboratory (Colab) offered by Google to execute the code described in this post. It basically consists of a Jupyter notebook environment that requires no configuration and runs completely in the Cloud allowing the use different Deep Learning libraries as PyTorch and TensorFlow . One important feature of Colab is that it provides GPU (and TPU) totally free. Detailed information about the service can be found on the faq page .

By default, Colab notebooks run on CPU. You can switch your notebook to run with GPU (or TPU). In order to obtain access to one GPU you need to choose the tab “Runtime” and then select “Change runtime type” as shown in the following figure:

When a pop-up window appears, select GPU. Ensure “Hardware accelerator” is set to GPU (the default is CPU). Afterwards, ensure that you are connected to the runtime (there is a green check next to “CONNECTED” in the menu ribbon):

Now you are able to run the code presented in this post. I suggest to copy & paste the code of this post in a Colab notebook in order to see the execution meanwhile you are reading this post. Ready?

The entire code of this post can be found on GitHub and can be run as a Colab google notebook using this link .

Handwritten digits example

In this post we will program a neural network model that classifies handwritten digits presented in theprevious post. Remember that we created a mathematical model that, given an image, the model identify the number it represents returning a vector with 10 positions indicating the likelihood of each of the ten possible digits.


In order to guide the explanation, we will follow a list of steps to be taken to program a neural network:

  1. Import required libraries
  2. Load and Preprocess the Data
  3. Define the Model
  4. Define the Optimizer and the Loss Function
  5. Train the Model
  6. Evaluate the Model

Let’s go for it!

1. Import required libraries

We always need to import torch , the core Python library for PyTorch. For our example we will also import the torchvision package, as well as the usual libraries numpy and matplotlib .

import torch 
import torchvision

For clarity of the code we could define here some hyperparameters that we will need for training:

import numpy as np 
import matplotlib.pyplot as plt EPOCH = 10 

2. Load and Preprocessing Data

Load Data

Next step is to load data that will be used to train our neural network . We will use the MNIST dataset already introduced in the previous post, which can be downloaded from The MNIST database page using torchvision.dataset. PyTorch Datasets are objects that return a single datapoint on request. Then it is passed on to a Dataloader which handles batching of datapoints and parallelism. This is the code for our example:

xy_trainPT = torchvision.datasets.MNIST(root='./data', 
             train=True, download=True,transform=
             [torchvision.transforms.ToTensor()]))xy_trainPT_loader =
                    (xy_trainPT, batch_size=BATCH_SIZE)

Because data is usually too large to fit data into CPU or GPU memory at once, it is split into batches of equal size. Every batch includes data samples and target labels, and both of them have to be tensors (which we will present below). The BATCH_SIZE argument indicates the number of data that we will use for each update of the model parameters.

This dataset contains 60,000 images of hand-made digits to train the model and it is ideal for entering pattern recognition techniques for the first time without having to spend much time preprocessing and formatting data, both very important and expensive steps in the analysis of data and of special complexity when working with images.

We can verify that the previous code have loaded the expected data with the library matplotlib.pyplot :

fig = plt.figure(figsize=(25, 4)) 
for idx in np.arange(20):
   image, label = xy_trainPT [idx]
   ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
   ax.imshow(torch.squeeze(image, dim = 0).numpy(), 

Preprocessing Data

Remember that in the previous post we explained that to facilitate the entry of data into our neural network we make a transformation of the input (image) from 2 dimensions (2D) to a vector of 1 dimension (1D). That is, the matrix of 28×28 numbers can be represented by a vector (array) of 784 numbers (concatenating row by row).

We will apply this transformation when we ingest the data to the neural network using this type of transformation (e.g. applied to the first image):

image, _ = xy_trainPT[0] 
image_flatten = image.view(image.shape[0], -1)
print (image_flatten.size())torch.Size([1, 28, 28]) 
torch.Size([1, 784])


A Tensor is a multi-dimensional array, fundamental building block of PyTorch, equivalent to NumPy, that stores a collection of numbers:

a = torch.randn(2, 3)
print(a)tensor([[ 1.1049, 0.2676, -0.4528],
        [ 0.0105, -0.5095, 0.7777]])

And we can know its dimensions and size with:

print(a.dim())torch.Size([2, 3])

Apart from dimensions, a tensor is characterized by the type of its elements. For this we have the dtype argument that is deliberately similar to the standard NumPy argument type of the same name:

matrix=torch.zeros([2, 4], dtype=torch.int32)
print(matrix)tensor([[0, 0, 0, 0],
        [0, 0, 0, 0]], dtype=torch.int32)

Torch defines nine types of CPU tensor and nine types of GPU tensor:

As you can see there are a specific types for GPU tensors. PyTorch transparently supports CUDA GPUs, which means that all operations have two versions — CPU and GPU — that are automatically selected. The decision is made based on the type of tensors that you are operating on.

There are different ways to create a tensor in PyTorch: calling a constructor of the required type, converting a NumPy array (or a Python list) into a tensor or asking PyTorch to create a tensor with specific data. For example we can use torch.zeros() function to create a tensor filled with zero values:

b = torch.zeros(2, 3)
print(b)tensor([[0., 0., 0.],
        [0., 0., 0.]])
c = torch.ones(2, 3)
print(c)tensor([[1., 1., 1.],
        [1., 1., 1.]])

An element of a tensor can be accessed using its index (which starts at 0):

print(c)tensor([[222.,   1.,   1.],         
        [  1.,   1.,   1.]]) 

Furthermore, just like in the usual data structures in Python, we can use the range notation in indexing to select and manipulate portions of the tensor with the help of the “ : ” character. Indexes start at 0 and we can use negative values for the indexes, where -1 is the last element and so on. Let’s look at a the following code for examples:

x = torch.Tensor([[1,2,3,4], [5,6,7,8], [9,10,11,12]]) 
print (x)
tensor([[ 1., 2., 3., 4.],
        [ 5., 6., 7., 8.],
        [ 9., 10., 11., 12.]])
print (“x column 1: “, x[:, 1])
print (“x row 0: “, x[0, :])
print (“x rows 0,1 & cols 1,2: \n”, x[0:2, 1:3])x column 1: tensor([ 2., 6., 10.])
x row 0: tensor([1., 2., 3., 4.])
x rows 0,1 & cols 1,2:
tensor([[2., 3.],
        [6., 7.]])

PyTorch tensors can be converted to NumPy matrices and vice versa very efficiently. By doing so, we can take advantage of the tremendous amount of functionality in Python ecosystem that has evolved around the NumPy array type. Let’s see with a simple code how it works:

x = np.array([[1,2], [3,4], [5,6]])
print (x)
[[1 2]
 [3 4]
 [5 6]]

This array x can be easily converted to a tensor as follows:

print(y)tensor([[1, 2],
       [3, 4],
       [5, 6]])

We can see that the second print indicates that it is a tensor. Conversely, if we want to transform a tensor into a NumPy array, we can do it as follows:

z = y.numpy()
print (z)[[1. 2.]
 [3. 4.]
 [5. 6.]]

We will use reshape() function, that returns a tensor with the same data and number of elements as input , but with the specified shape. When possible, the returned tensor will be a view of input. Otherwise, it will be a copy (in memory):

one_d = torch.arange(0,16)
print (one_d)two_d= one_d.reshape(4,4)
print (two_d)print(two_d.size())tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])tensor([[ 0, 1, 2, 3],
        [ 4, 5, 6, 7],
        [ 8, 9, 10, 11],
        [12, 13, 14, 15]])torch.Size([4, 4])

3. Define Model

In the torch.nn package, you can find many predefined classes providing the basic functionality blocks required for programming neural networks. To define the model presented in theprevious post, it can be done with the Sequential class from this package:

modelPT= torch.nn.Sequential(            

The code is defining a neural network composed of a two dense layers (linear layer) of 10 neurons each, one with a Sigmoid activation function and the other with the Softmax activation function. As we advance the series we will introduce other activation functions , as ReLU, that we will use in a next post in this series.

I would like to highlight that the previous code adds a small transformation to the neural network presented in the previous post: additionally, it is applying a logarithm operation to each of the outputs of the last layer. Specifically, the LogSoftmax function which can be seen as:

where Softmax is calculated as defined in theprevious post. There are a number of practical and theoretical advantages of LogSoftmax over Softmax that motivate its use in building neural networks that we will discuss in a later section.

In summary, the network that we have defined can be visually represented as shown in the following figure: