Transformers ================================================================================================================================================ 🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch. This is the documentation of our repository `transformers `__. Features --------------------------------------------------- - As easy to use as pytorch-transformers - As powerful and concise as Keras - High performance on NLU and NLG tasks - Low barrier to entry for educators and practitioners State-of-the-art NLP for everyone: - Deep learning researchers - Hands-on practitioners - AI/ML/NLP teachers and educators Lower compute costs, smaller carbon footprint: - Researchers can share trained models instead of always retraining - Practitioners can reduce compute time and production costs - 8 architectures with over 30 pretrained models, some in more than 100 languages Choose the right framework for every part of a model's lifetime: - Train state-of-the-art models in 3 lines of code - Deep interoperability between TensorFlow 2.0 and PyTorch models - Move a single model between TF2.0/PyTorch frameworks at will - Seamlessly pick the right framework for training, evaluation, production Contents --------------------------------- The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and conversion utilities for the following models: 1. `BERT `_ (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding `_ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. 2. `GPT `_ (from OpenAI) released with the paper `Improving Language Understanding by Generative Pre-Training `_ by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. 3. `GPT-2 `_ (from OpenAI) released with the paper `Language Models are Unsupervised Multitask Learners `_ by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. 4. `Transformer-XL `_ (from Google/CMU) released with the paper `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context `_ by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. 5. `XLNet `_ (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive Pretraining for Language Understanding `_ by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. 6. `XLM `_ (from Facebook) released together with the paper `Cross-lingual Language Model Pretraining `_ by Guillaume Lample and Alexis Conneau. 7. `RoBERTa `_ (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach `_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. 8. `DistilBERT `_ (from HuggingFace) released together with the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter `_ by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2 `_. 9. `CTRL `_ (from Salesforce), released together with the paper `CTRL: A Conditional Transformer Language Model for Controllable Generation `_ by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. 10. `CamemBERT `_ (from FAIR, Inria, Sorbonne Université) released together with the paper `CamemBERT: a Tasty French Language Model `_ by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suarez, Yoann Dupont, Laurent Romary, Eric Villemonte de la Clergerie, Djame Seddah, and Benoît Sagot. 11. `ALBERT `_ (from Google Research), released together with the paper a `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations `_ by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. 12. `XLM-RoBERTa `_ (from Facebook AI), released together with the paper `Unsupervised Cross-lingual Representation Learning at Scale `_ by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. 13. `FlauBERT `_ (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model Pre-training for French `_ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab. .. toctree:: :maxdepth: 2 :caption: Notes installation quickstart glossary pretrained_models usage model_sharing examples notebooks serialization converting_tensorflow_models migration bertology torchscript multilingual benchmarks .. toctree:: :maxdepth: 2 :caption: Main classes main_classes/configuration main_classes/model main_classes/tokenizer main_classes/pipelines main_classes/optimizer_schedules main_classes/processors .. toctree:: :maxdepth: 2 :caption: Package Reference model_doc/auto model_doc/encoderdecoder model_doc/bert model_doc/gpt model_doc/transformerxl model_doc/gpt2 model_doc/xlm model_doc/xlnet model_doc/roberta model_doc/distilbert model_doc/ctrl model_doc/camembert model_doc/albert model_doc/xlmroberta model_doc/flaubert model_doc/bart model_doc/t5 model_doc/electra model_doc/dialogpt model_doc/reformer model_doc/marian