Transformers ======================================================================================================================= State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0. 🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch. This is the documentation of our repository `transformers `_. Features ----------------------------------------------------------------------------------------------------------------------- - High performance on NLU and NLG tasks - Low barrier to entry for educators and practitioners State-of-the-art NLP for everyone: - Deep learning researchers - Hands-on practitioners - AI/ML/NLP teachers and educators Lower compute costs, smaller carbon footprint: - Researchers can share trained models instead of always retraining - Practitioners can reduce compute time and production costs - 8 architectures with over 30 pretrained models, some in more than 100 languages Choose the right framework for every part of a model's lifetime: - Train state-of-the-art models in 3 lines of code - Deep interoperability between TensorFlow 2.0 and PyTorch models - Move a single model between TF2.0/PyTorch frameworks at will - Seamlessly pick the right framework for training, evaluation, production Contents ----------------------------------------------------------------------------------------------------------------------- The documentation is organized in five parts: - **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy and a glossary. - **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library. - **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library. - **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general resarch in transformers model - The three last section contain the documentation of each public class and function, grouped in: - **MAIN CLASSES** for the main classes exposing the important APIs of the library. - **MODELS** for the classes and functions related to each model implemented in the library. - **INTERNAL HELPERS** for the classes and functions we use internally. The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and conversion utilities for the following models: 1. `BERT `_ (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding `_ by Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2. `GPT `_ (from OpenAI) released with the paper `Improving Language Understanding by Generative Pre-Training `_ by Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 3. `GPT-2 `_ (from OpenAI) released with the paper `Language Models are Unsupervised Multitask Learners `_ by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 4. `Transformer-XL `_ (from Google/CMU) released with the paper `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context `_ by Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, and Ruslan Salakhutdinov. 5. `XLNet `_ (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive Pretraining for Language Understanding `_ by Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 6. `XLM `_ (from Facebook) released together with the paper `Cross-lingual Language Model Pretraining `_ by Guillaume Lample and Alexis Conneau. 7. `RoBERTa `_ (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach `_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 8. `DistilBERT `_ (from HuggingFace) released together with the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter `_ by Victor Sanh, Lysandre Debut, and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2 `_. 9. `CTRL `_ (from Salesforce), released together with the paper `CTRL: A Conditional Transformer Language Model for Controllable Generation `_ by Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, and Richard Socher. 10. `CamemBERT `_ (from FAIR, Inria, Sorbonne Université) released together with the paper `CamemBERT: a Tasty French Language Model `_ by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suarez, Yoann Dupont, Laurent Romary, Eric Villemonte de la Clergerie, Djame Seddah, and Benoît Sagot. 11. `ALBERT `_ (from Google Research), released together with the paper `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations `_ by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 12. `T5 `_ (from Google) released with the paper `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer `_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 13. `XLM-RoBERTa `_ (from Facebook AI), released together with the paper `Unsupervised Cross-lingual Representation Learning at Scale `_ by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 14. `MMBT `_ (from Facebook), released together with the paper a `Supervised Multimodal Bitransformers for Classifying Images and Text `_ by Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, and Davide Testuggine. 15. `FlauBERT `_ (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model Pre-training for French `_ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, and Didier Schwab. 16. `BART `_ (from Facebook) released with the paper `BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension `_ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 17. `ELECTRA `_ (from Google Research/Stanford University) released with the paper `ELECTRA: Pre-training text encoders as discriminators rather than generators `_ by Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 18. `DialoGPT `_ (from Microsoft Research) released with the paper `DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation `_ by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. 19. `Reformer `_ (from Google Research) released with the paper `Reformer: The Efficient Transformer `_ by Nikita Kitaev, Łukasz Kaiser, and Anselm Levskaya. 20. `MarianMT `_ (developed by the Microsoft Translator Team) machine translation models trained using `OPUS `_ pretrained_models data by Jörg Tiedemann. 21. `Longformer `_ (from AllenAI) released with the paper `Longformer: The Long-Document Transformer `_ by Iz Beltagy, Matthew E. Peters, and Arman Cohan. 22. `DPR `_ (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain Question Answering `_ by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 23. `Pegasus `_ (from Google) released with the paper `PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization `_ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. 24. `MBart `_ (from Facebook) released with the paper `Multilingual Denoising Pre-training for Neural Machine Translation `_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. 25. `LXMERT `_ (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering `_ by Hao Tan and Mohit Bansal. 26. `Funnel Transformer `_ (from CMU/Google Brain) released with the paper `Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing `_ by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. 27. `Bert For Sequence Generation `_ (from Google) released with the paper `Leveraging Pre-trained Checkpoints for Sequence Generation Tasks `_ by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. 28. `LayoutLM `_ (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training of Text and Layout for Document Image Understanding `_ by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou. 29. `Other community models `_, contributed by the `community `_. .. toctree:: :maxdepth: 2 :caption: Get started quicktour installation philosophy glossary .. toctree:: :maxdepth: 2 :caption: Using 🤗 Transformers task_summary model_summary preprocessing training model_sharing tokenizer_summary multilingual .. toctree:: :maxdepth: 2 :caption: Advanced guides pretrained_models examples custom_datasets notebooks converting_tensorflow_models migration contributing testing serialization .. toctree:: :maxdepth: 2 :caption: Research bertology perplexity benchmarks .. toctree:: :maxdepth: 2 :caption: Main Classes main_classes/configuration main_classes/logging main_classes/model main_classes/optimizer_schedules main_classes/output main_classes/pipelines main_classes/processors main_classes/tokenizer main_classes/trainer .. toctree:: :maxdepth: 2 :caption: Models model_doc/albert model_doc/auto model_doc/bart model_doc/bert model_doc/bertgeneration model_doc/camembert model_doc/ctrl model_doc/dialogpt model_doc/distilbert model_doc/dpr model_doc/electra model_doc/encoderdecoder model_doc/flaubert model_doc/fsmt model_doc/funnel model_doc/layoutlm model_doc/longformer model_doc/lxmert model_doc/marian model_doc/mbart model_doc/mobilebert model_doc/gpt model_doc/gpt2 model_doc/pegasus model_doc/rag model_doc/reformer model_doc/retribert model_doc/roberta model_doc/t5 model_doc/transformerxl model_doc/xlm model_doc/xlmroberta model_doc/xlnet .. toctree:: :maxdepth: 2 :caption: Internal Helpers internal/modeling_utils internal/pipelines_utils internal/tokenization_utils