In Chapter 1, we used Transformer models for different tasks using the high-level pipeline API. Although this API is powerful and convenient, it’s important to understand how it works under the hood so we have the flexibility to solve other problems.

In this chapter, you will learn:

  • How to use tokenizers and models to replicate the pipeline API’s behavior
  • How to load and save models and tokenizers
  • Different tokenization approaches, such as word-based, character-based, and subword-based
  • How to handle multiple sentences of varying lengths


As you saw in Chapter 1, Transformer models are usually very large. With millions to tens of billions of parameters, training and deploying these models is a complicated undertaking. Furthermore, with new models being released on a near-daily basis and each having its own implementation, trying them all out is no easy task.

The 🤗 Transformers library was created to solve this problem. Its goal is to provide a single API through which any Transformer model can be loaded, trained, and saved. The library’s main features are:

  • Ease of use: Downloading, loading, and using a state-of-the-art NLP model for inference can be done in just two lines of code.
  • Flexibility: At their core, all models are simple PyTorch nn.Module or TensorFlow tf.keras.Model classes and can be handled like any other models in their respective machine learning (ML) frameworks.
  • Simplicity: Hardly any abstractions are made across the library. The “All in one file” is a core concept: a model’s forward pass is entirely defined in a single file, so that the code itself is understandable and hackable.

This last feature makes 🤗 Transformers quite different from other ML libraries. The models are not built on modules that are shared across files; instead, each model has its own layers. In addition to making the models more approachable and understandable, this allows you to easily experiment on one model without affecting others.

This chapter will begin with an end-to-end example where we use a model and a tokenizer together to replicate the pipeline API introduced in Chapter 1. Next, we’ll discuss the model API: we’ll dive into the model and configuration classes, and show you how to load a model and how it processes numerical inputs to output predictions.

Then we’ll look at the tokenizer API, which is the other main component of the pipeline. Tokenizers take care of the first and last processing steps, handling the conversion from text to numerical inputs for the neural network, and the conversion back to text when it is needed. Finally, we’ll show you how to handle sending multiple sentences through a model in a prepared batch, then wrap it all up with a closer look at the high-level tokenizer function.

⚠️ In order to benefit from all features available with the Model Hub and 🤗 Transformers, we recommend creating an account.