🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.
This is the documentation of our repository transformers.
As easy to use as pytorch-transformers
As powerful and concise as Keras
High performance on NLU and NLG tasks
Low barrier to entry for educators and practitioners
State-of-the-art NLP for everyone:
Deep learning researchers
AI/ML/NLP teachers and educators
Lower compute costs, smaller carbon footprint:
Researchers can share trained models instead of always retraining
Practitioners can reduce compute time and production costs
8 architectures with over 30 pretrained models, some in more than 100 languages
Choose the right framework for every part of a model’s lifetime:
Train state-of-the-art models in 3 lines of code
Deep interoperability between TensorFlow 2.0 and PyTorch models
Move a single model between TF2.0/PyTorch frameworks at will
Seamlessly pick the right framework for training, evaluation, production
The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
BERT (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
GPT (from OpenAI) released with the paper Improving Language Understanding by Generative Pre-Training by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
GPT-2 (from OpenAI) released with the paper Language Models are Unsupervised Multitask Learners by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
Transformer-XL (from Google/CMU) released with the paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
XLNet (from Google/CMU) released with the paper XLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
RoBERTa (from Facebook), released together with the paper a Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
DistilBERT (from HuggingFace) released together with the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2.
- Pretrained models
- Loading Google AI or OpenAI pre-trained weights or PyTorch dump
- Serialization best-practices
- Converting Tensorflow Checkpoints
- Migrating from pytorch-pretrained-bert
- Multi-lingual models
- OpenAI GPT
- Transformer XL
- OpenAI GPT2