Transformers ======================================================================================================================= State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0. šŸ¤— Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch. This is the documentation of our repository `transformers `_. Features ----------------------------------------------------------------------------------------------------------------------- - High performance on NLU and NLG tasks - Low barrier to entry for educators and practitioners State-of-the-art NLP for everyone: - Deep learning researchers - Hands-on practitioners - AI/ML/NLP teachers and educators Lower compute costs, smaller carbon footprint: - Researchers can share trained models instead of always retraining - Practitioners can reduce compute time and production costs - 8 architectures with over 30 pretrained models, some in more than 100 languages Choose the right framework for every part of a model's lifetime: - Train state-of-the-art models in 3 lines of code - Deep interoperability between TensorFlow 2.0 and PyTorch models - Move a single model between TF2.0/PyTorch frameworks at will - Seamlessly pick the right framework for training, evaluation, production Contents ----------------------------------------------------------------------------------------------------------------------- The documentation is organized in five parts: - **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy and a glossary. - **USING šŸ¤— TRANSFORMERS** contains general tutorials on how to use the library. - **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library. - **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general resarch in transformers model - The three last section contain the documentation of each public class and function, grouped in: - **MAIN CLASSES** for the main classes exposing the important APIs of the library. - **MODELS** for the classes and functions related to each model implemented in the library. - **INTERNAL HELPERS** for the classes and functions we use internally. The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and conversion utilities for the following models: .. This list is updated automatically from the README with `make fix-copies`. Do not update manually! 1. :doc:`ALBERT ` (from Google Research and the Toyota Technological Institute at Chicago) released with the paper `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations `__, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. 2. :doc:`BART ` (from Facebook) released with the paper `BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension `__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. 3. :doc:`BERT ` (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding `__ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. 4. :doc:`BERT For Sequence Generation ` (from Google) released with the paper `Leveraging Pre-trained Checkpoints for Sequence Generation Tasks `__ by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. 5. :doc:`Blenderbot ` (from Facebook) released with the paper `Recipes for building an open-domain chatbot `__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 6. :doc:`CamemBERT ` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty French Language Model `__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz SuĆ”rez*, Yoann Dupont, Laurent Romary, Ɖric Villemonte de la Clergerie, DjamĆ© Seddah and BenoĆ®t Sagot. 7. :doc:`CTRL ` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language Model for Controllable Generation `__ by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. 8. :doc:`DeBERTa ` (from Microsoft Research) released with the paper `DeBERTa: Decoding-enhanced BERT with Disentangled Attention `__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. 9. :doc:`DialoGPT ` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation `__ by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. 10. :doc:`DistilBERT ` (from HuggingFace), released together with the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter `__ by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2 `__, RoBERTa into `DistilRoBERTa `__, Multilingual BERT into `DistilmBERT `__ and a German version of DistilBERT. 11. :doc:`DPR ` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain Question Answering `__ by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 12. :doc:`ELECTRA ` (from Google Research/Stanford University) released with the paper `ELECTRA: Pre-training text encoders as discriminators rather than generators `__ by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. 13. :doc:`FlauBERT ` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model Pre-training for French `__ by Hang Le, LoĆÆc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, BenoĆ®t CrabbĆ©, Laurent Besacier, Didier Schwab. 14. :doc:`Funnel Transformer ` (from CMU/Google Brain) released with the paper `Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing `__ by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. 15. :doc:`GPT ` (from OpenAI) released with the paper `Improving Language Understanding by Generative Pre-Training `__ by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. 16. :doc:`GPT-2 ` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask Learners `__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. 17. :doc:`LayoutLM ` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training of Text and Layout for Document Image Understanding `__ by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou. 18. :doc:`Longformer ` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer `__ by Iz Beltagy, Matthew E. Peters, Arman Cohan. 19. :doc:`LXMERT ` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering `__ by Hao Tan and Mohit Bansal. 20. :doc:`MarianMT ` Machine translation models trained using `OPUS `__ data by Jƶrg Tiedemann. The `Marian Framework `__ is being developed by the Microsoft Translator Team. 21. :doc:`MBart ` (from Facebook) released with the paper `Multilingual Denoising Pre-training for Neural Machine Translation `__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. 22. :doc:`Pegasus ` (from Google) released with the paper `PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization `__> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. 23. :doc:`ProphetNet ` (from Microsoft Research) released with the paper `ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training `__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. 24. :doc:`Reformer ` (from Google Research) released with the paper `Reformer: The Efficient Transformer `__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. 25. :doc:`RoBERTa ` (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach `__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. ultilingual BERT into `DistilmBERT `__ and a German version of DistilBERT. 26. :doc:`SqueezeBert ` released with the paper `SqueezeBERT: What can computer vision teach NLP about efficient neural networks? `__ by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. 27. :doc:`T5 ` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer `__ by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. 28. :doc:`Transformer-XL ` (from Google/CMU) released with the paper `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context `__ by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. 29. :doc:`XLM ` (from Facebook) released together with the paper `Cross-lingual Language Model Pretraining `__ by Guillaume Lample and Alexis Conneau. 30. :doc:`XLM-ProphetNet ` (from Microsoft Research) released with the paper `ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training `__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. 31. :doc:`XLM-RoBERTa ` (from Facebook AI), released together with the paper `Unsupervised Cross-lingual Representation Learning at Scale `__ by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco GuzmĆ”n, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. 32. :doc:`XLNet ` (from Google/CMU) released with the paper `ā€‹XLNet: Generalized Autoregressive Pretraining for Language Understanding `__ by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. 33. `Other community models `__, contributed by the `community `__. .. toctree:: :maxdepth: 2 :caption: Get started quicktour installation philosophy glossary .. toctree:: :maxdepth: 2 :caption: Using šŸ¤— Transformers task_summary model_summary preprocessing training model_sharing tokenizer_summary multilingual .. toctree:: :maxdepth: 2 :caption: Advanced guides pretrained_models examples custom_datasets notebooks converting_tensorflow_models migration contributing testing serialization .. toctree:: :maxdepth: 2 :caption: Research bertology perplexity benchmarks .. toctree:: :maxdepth: 2 :caption: Main Classes main_classes/callback main_classes/configuration main_classes/logging main_classes/model main_classes/optimizer_schedules main_classes/output main_classes/pipelines main_classes/processors main_classes/tokenizer main_classes/trainer .. toctree:: :maxdepth: 2 :caption: Models model_doc/albert model_doc/auto model_doc/bart model_doc/bert model_doc/bertgeneration model_doc/blenderbot model_doc/camembert model_doc/ctrl model_doc/deberta model_doc/dialogpt model_doc/distilbert model_doc/dpr model_doc/electra model_doc/encoderdecoder model_doc/flaubert model_doc/fsmt model_doc/funnel model_doc/layoutlm model_doc/longformer model_doc/lxmert model_doc/marian model_doc/mbart model_doc/mobilebert model_doc/gpt model_doc/gpt2 model_doc/pegasus model_doc/prophetnet model_doc/rag model_doc/reformer model_doc/retribert model_doc/roberta model_doc/squeezebert model_doc/t5 model_doc/transformerxl model_doc/xlm model_doc/xlmprophetnet model_doc/xlmroberta model_doc/xlnet .. toctree:: :maxdepth: 2 :caption: Internal Helpers internal/modeling_utils internal/pipelines_utils internal/tokenization_utils internal/trainer_utils internal/generation_utils