Productive NLP Experimentation with Python using Pytorch Lightning and Torchtext

Why Use Pytorch Lightning

Reduce Boilerplate

No more writing training routine unless you really have to. You can define your training as

from pytorch_lightning import Trainertrainer = Trainer(

The job of a Trainer is to do your training routine.

  • No more writing loop. As you can see, there is no more writing for loop that usually seen at pytorch tutorial.
  • No converting your model to gpu. You do not have to care about forgetting to convert your model to cuda .
  • No custom printing on your loss. You see logger variable there? you can use Tensorboard to manage your logs and I recommend you use it. Do pip install tensorboard before you use it in your local.
Sample of Tensorboard Generated by Pytorch Lightning

In this screenshot, I defined the logger variable as

from pytorch_lightning.loggers import TensorBoardLoggerlogger = TensorBoardLogger('tb_logs', name='my_model')

Pytorch Lightning will make a log dir, named tb_logs and yyou can refer that log directory for your Tensorboard (if you are running your Tensorboard separately from Jupyter notebook).

tensorboard --logdir tb_logs/

Organize Code

Besides constructor and forward you will be able to define more functions

  • configure_optimizer . Expect to return a Pytorch optimizer from torch.optim package.
def configure_optimizers(self):
 return Adam(self.parameters(), lr=0.01)
  • train_step . Given a batch and batch number, define how will we feed the input to the model.
def training_step(self, batch, batch_idx):
 x, y = batch.text[0].T, batch.label
 y_hat = self(x)
 loss = self.loss_function(y_hat, y)
 return dict(

In this example, notice that I do a little transformation using transpose. It is possible to do all kind of transformations before feeding into the model, but I suggest you do the heavy transformations outside this function so that it will be clean.

I have also define the loss_function as part of the model and “hardcoded” it using Cross Entropy. If you do not want that, you can use torch.functional as F then call your functional loss function, such as F.log_softmax() . Another thing you can do is to let the model constructor to accept loss function as parameter.

  • train_dataloader . Define how you wanted to load your training data loader.

Pytorch Dataloader is an API that helps you with batching the input. Though, to my knowledge, Pytorch Lightning will run for batch_idx, batch in enumerate(train_dataloader) (not exactly like this, but similar). This means you are free to define anything here that is iterable.

  • test_step . Given a batch and batch number, define how will we feed the input to the model for test. It is important to note that we do not have to feed to loss function in this step, because we are running with no gradient.
  • test_dataloader . Define how you wanted to load your test data loader
  • test_epoch_end . Given all test outputs, define some action that you wanted to do with the test outputs. If you do not want to define this, then you can, but it will show warning when you have defined test_step and test_dataloader because then you are basically do nothing to your test data.

Using Pytorch Lightning with Torchtext

Previously, I have described my exploration to use torchtext [4]. Now I wanted to improve even more of my productivity on the experiment part, which includes training, testing, validating, metric logging. All of these can be achieved by using Pytorch Lightning.

I will take the IMDB sentiment classification dataset , that has been available in the Torchtext package.

Loading Dataset

IMDB sentiment classification dataset is a text classification task, given a review text predict if it is a positive or negative review. There is an official short tutorial from torchtext [5], however, that tutorial does not cover the training part. I will use some of the tutorial codes and connect them with training using Pytorch Lightning.

This dataset contains 3 classes: unknown, positive (labeled as “pos”), negative (labeled as “neg”) . So, we know that we will need to define an output that could predict 3 classes. It is a classification task so that I will use CrossEntropy loss.

Now to load the data you can do

from import Field 
from torchtext.datasets import IMDBtext_field = Field(sequential=True, include_lengths=True, fix_length=200)
label_field = Field(sequential=False)train, test = IMDB.splits(text_field, label_field)

Since the IMDB review is not in uniform length, using a fixed-length parameter will help you to pad/trim the sequence data .

You can access your sample data using train.examples[i] to peek what is inside the train and test variable.

Building Vocabulary

Pre-trained word embedding is usually trained to different data that we used. Thus it will use different “encoding” from token to integer that we currently have. build_vocab will re-map the current integer encoding that comes from the current dataset, in this case, the IMDB dataset, with pre-trained encoding. For example, if token 2 in our vocabulary is eat , but eat is token number 15 in pre-trained word embedding then it will be automatically mapped to the correct token number.

from torchtext.vocab import FastTexttext_field.build_vocab(train, vectors=FastText('simple'))

Label field in IMDB dataset will be in the form of pos , neg , and <unk> , so that it will still need to build its own vocab but without word embedding.

Splitting and Making Iterator

Iterator works a bit like Dataloader, it helps with batching and iterating the data in 1 epoch. We can use BucketIterator to help us iterate with a specific number of batch and convert all of those vectors into a device, where the device can be cpu or cuda .

from import BucketIteratordevice = 'cuda' if torch.cuda.is_available() else 'cpu'
batch_size = 32train_iter, test_iter = BucketIterator.splits(
 (train, test), 

Now we are ready to define our model.

Model Definition

Defining the model with Pytorch Lightning is as easy as William has explained [2].


It is better to make sure that your model can accept passed input correctly before doing the full training, like this.

sample_batch = next(iter(train_iter))

Let me explain why I did the transformations.

Each batch object, from an iterator, has text and label fields. The text field is actually a tuple of the real word vector and actual length vector of a review. Real word vector will be at size fixed_length x batch_size , while the actual length vector will be at size batch_size . In order to feed the model with the word vector, I need to: take the first tuple and rotate it so that it will produce batch_size x fixed_length .

We are now ready to train our model!

from pytorch_lightning import Trainer
from pytorch_lightning.loggers import TensorBoardLoggermodel = MyModel(text_field.vocab.vectors)
logger = TensorBoardLogger('tb_logs', name='my_model')
trainer = Trainer(

and it’s done! It will show the progress bar automatically so you don’t have to do tqdm anymore.

for batch_idx, batch in tqdm(enumerate(train_loader)):

After training, you can do testing by 1 line


If you are thinking why this test method only returns one object? Then probably you are thinking of scikit-learn’s train and test split. In Pytorch, the “test” part is usually defined as “validation”. So you might want to define validation_step and val_dataloader instead of test_* .


In my opinion, using Pytorch lightning and Torchtext does improve my productivity to experiment with NLP deep learning models. Some of the aspects I think make this library very compelling are backward compatibility with Pytorch, Torchtext friendly, and leverage the use of Tensorboard.

Backward Compatibility with Pytorch

If you are somehow hesitant because you think it will be an overhead to use a new library, then do not worry! You can install first, use the LightningModule instead of nn.Module and write the usual Pytorch code. It will still work because this library does not cause any additional headaches.

Torchtext Friendly

It was fairly easy to use Torchtext along with Pytorch Lightning. Both libraries run on Pytorch and do have high compatibility with native Pytorch. Both have additional features that do not intersect but complement each other. For example, Torchtext has easy interfaces to load Dataset like IMDB or YelpReview. Then you can use Pytorch Lightning to train whatever model you wanted to define and log to Tensorboard or MLFlow.

Leverage Tensorboard Usage

Using Tensorboard instead of manually printing your losses and other metrics helps me eliminate unnecessary errors in printing losses on the training loop. It will also eliminate the need to visualize loss vs epoch plot at the end of the training.


[1] Pytorch Lightning Documentation.

[2] Falcon, W. From PyTorch to PyTorch Lightning — A gentle introduction.

[3] Falcon, W. Pytorch Lightning vs PyTorch Ignite vs

[4] Sutiono, Arie P. Deep Learning For NLP with PyTorch and Torchtext.

[5] Torchtext Datasets Documentation.