Course documentation

Tokenizers, check!

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Tokenizers, check!

Great job finishing this chapter!

After this deep dive into tokenizers, you should:

  • Be able to train a new tokenizer using an old one as a template
  • Understand how to use offsets to map tokens’ positions to their original span of text
  • Know the differences between BPE, WordPiece, and Unigram
  • Be able to mix and match the blocks provided by the 🤗 Tokenizers library to build your own tokenizer
  • Be able to use that tokenizer inside the 🤗 Transformers library