Transformers ======================================================================================================================= State-of-the-art Natural Language Processing for Jax, Pytorch and TensorFlow ๐Ÿค— Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between Jax, PyTorch and TensorFlow. This is the documentation of our repository `transformers `__. You can also follow our `online course `__ that teaches how to use this library, as well as the other libraries developed by Hugging Face and the Hub. If you are looking for custom support from the Hugging Face team ----------------------------------------------------------------------------------------------------------------------- .. raw:: html HuggingFace Expert Acceleration Program
Features ----------------------------------------------------------------------------------------------------------------------- - High performance on NLU and NLG tasks - Low barrier to entry for educators and practitioners State-of-the-art NLP for everyone: - Deep learning researchers - Hands-on practitioners - AI/ML/NLP teachers and educators .. Copyright 2020 The HuggingFace Team. All rights reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Lower compute costs, smaller carbon footprint: - Researchers can share trained models instead of always retraining - Practitioners can reduce compute time and production costs - 8 architectures with over 30 pretrained models, some in more than 100 languages Choose the right framework for every part of a model's lifetime: - Train state-of-the-art models in 3 lines of code - Deep interoperability between Jax, Pytorch and TensorFlow models - Move a single model between Jax/PyTorch/TensorFlow frameworks at will - Seamlessly pick the right framework for training, evaluation, production The support for Jax is still experimental (with a few models right now), expect to see it grow in the coming months! `All the model checkpoints `__ are seamlessly integrated from the huggingface.co `model hub `__ where they are uploaded directly by `users `__ and `organizations `__. Current number of checkpoints: |checkpoints| .. |checkpoints| image:: https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen Contents ----------------------------------------------------------------------------------------------------------------------- The documentation is organized in five parts: - **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy and a glossary. - **USING ๐Ÿค— TRANSFORMERS** contains general tutorials on how to use the library. - **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library. - **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general research in transformers model - The three last section contain the documentation of each public class and function, grouped in: - **MAIN CLASSES** for the main classes exposing the important APIs of the library. - **MODELS** for the classes and functions related to each model implemented in the library. - **INTERNAL HELPERS** for the classes and functions we use internally. The library currently contains Jax, PyTorch and Tensorflow implementations, pretrained model weights, usage scripts and conversion utilities for the following models. Supported models ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. This list is updated automatically from the README with `make fix-copies`. Do not update manually! 1. :doc:`ALBERT ` (from Google Research and the Toyota Technological Institute at Chicago) released with the paper `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations `__, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. 2. :doc:`BART ` (from Facebook) released with the paper `BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension `__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. 3. :doc:`BARThez ` (from ร‰cole polytechnique) released with the paper `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model `__ by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. 4. :doc:`BERT ` (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding `__ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. 5. :doc:`BERT For Sequence Generation ` (from Google) released with the paper `Leveraging Pre-trained Checkpoints for Sequence Generation Tasks `__ by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. 6. :doc:`BigBird-RoBERTa ` (from Google Research) released with the paper `Big Bird: Transformers for Longer Sequences `__ by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 7. :doc:`BigBird-Pegasus ` (from Google Research) released with the paper `Big Bird: Transformers for Longer Sequences `__ by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 8. :doc:`Blenderbot ` (from Facebook) released with the paper `Recipes for building an open-domain chatbot `__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 9. :doc:`BlenderbotSmall ` (from Facebook) released with the paper `Recipes for building an open-domain chatbot `__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 10. :doc:`BORT ` (from Alexa) released with the paper `Optimal Subarchitecture Extraction For BERT `__ by Adrian de Wynter and Daniel J. Perry. 11. :doc:`ByT5 ` (from Google Research) released with the paper `ByT5: Towards a token-free future with pre-trained byte-to-byte models `__ by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. 12. :doc:`CamemBERT ` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty French Language Model `__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suรกrez*, Yoann Dupont, Laurent Romary, ร‰ric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot. 13. :doc:`CLIP ` from (OpenAI) released with the paper `Learning Transferable Visual Models From Natural Language Supervision `__ by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. 14. :doc:`ConvBERT ` (from YituTech) released with the paper `ConvBERT: Improving BERT with Span-based Dynamic Convolution `__ by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan. 15. :doc:`CPM ` (from Tsinghua University) released with the paper `CPM: A Large-scale Generative Chinese Pre-trained Language Model `__ by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun. 16. :doc:`CTRL ` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language Model for Controllable Generation `__ by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. 17. :doc:`DeBERTa ` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT with Disentangled Attention `__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. 18. :doc:`DeBERTa-v2 ` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT with Disentangled Attention `__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. 19. :doc:`DeiT ` (from Facebook) released with the paper `Training data-efficient image transformers & distillation through attention `__ by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou. 20. :doc:`DETR ` (from Facebook) released with the paper `End-to-End Object Detection with Transformers `__ by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko. 21. :doc:`DialoGPT ` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation `__ by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. 22. :doc:`DistilBERT ` (from HuggingFace), released together with the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter `__ by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2 `__, RoBERTa into `DistilRoBERTa `__, Multilingual BERT into `DistilmBERT `__ and a German version of DistilBERT. 23. :doc:`DPR ` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain Question Answering `__ by Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 24. :doc:`ELECTRA ` (from Google Research/Stanford University) released with the paper `ELECTRA: Pre-training text encoders as discriminators rather than generators `__ by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. 25. :doc:`FlauBERT ` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model Pre-training for French `__ by Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab. 26. :doc:`Funnel Transformer ` (from CMU/Google Brain) released with the paper `Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing `__ by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. 27. :doc:`GPT ` (from OpenAI) released with the paper `Improving Language Understanding by Generative Pre-Training `__ by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. 28. :doc:`GPT-2 ` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask Learners `__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. 29. :doc:`GPT Neo ` (from EleutherAI) released in the repository `EleutherAI/gpt-neo `__ by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. 30. :doc:`Hubert ` (from Facebook) released with the paper `HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units `__ by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed. 31. :doc:`I-BERT ` (from Berkeley) released with the paper `I-BERT: Integer-only BERT Quantization `__ by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer 32. :doc:`LayoutLM ` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training of Text and Layout for Document Image Understanding `__ by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou. 33. :doc:`LED ` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer `__ by Iz Beltagy, Matthew E. Peters, Arman Cohan. 34. :doc:`Longformer ` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer `__ by Iz Beltagy, Matthew E. Peters, Arman Cohan. 35. :doc:`LUKE ` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention `__ by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto. 36. :doc:`LXMERT ` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering `__ by Hao Tan and Mohit Bansal. 37. :doc:`M2M100 ` (from Facebook) released with the paper `Beyond English-Centric Multilingual Machine Translation `__ by by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin. 38. :doc:`MarianMT ` Machine translation models trained using `OPUS `__ data by Jรถrg Tiedemann. The `Marian Framework `__ is being developed by the Microsoft Translator Team. 39. :doc:`MBart ` (from Facebook) released with the paper `Multilingual Denoising Pre-training for Neural Machine Translation `__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. 40. :doc:`MBart-50 ` (from Facebook) released with the paper `Multilingual Translation with Extensible Multilingual Pretraining and Finetuning `__ by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan. 41. :doc:`Megatron-BERT ` (from NVIDIA) released with the paper `Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism `__ by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. 42. :doc:`Megatron-GPT2 ` (from NVIDIA) released with the paper `Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism `__ by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. 43. :doc:`MPNet ` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted Pre-training for Language Understanding `__ by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. 44. :doc:`MT5 ` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained text-to-text transformer `__ by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. 45. :doc:`Pegasus ` (from Google) released with the paper `PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization `__> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. 46. :doc:`ProphetNet ` (from Microsoft Research) released with the paper `ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training `__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. 47. :doc:`Reformer ` (from Google Research) released with the paper `Reformer: The Efficient Transformer `__ by Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya. 48. :doc:`RoBERTa ` (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach `__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. 49. :doc:`RoFormer ` (from ZhuiyiTechnology), released together with the paper a `RoFormer: Enhanced Transformer with Rotary Position Embedding `__ by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu. 50. :doc:`SpeechToTextTransformer ` (from Facebook), released together with the paper `fairseq S2T: Fast Speech-to-Text Modeling with fairseq `__ by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino. 51. :doc:`SqueezeBert ` released with the paper `SqueezeBERT: What can computer vision teach NLP about efficient neural networks? `__ by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. 52. :doc:`T5 ` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer `__ by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. 53. :doc:`TAPAS ` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via Pre-training `__ by Jonathan Herzig, Paweล‚ Krzysztof Nowak, Thomas Mรผller, Francesco Piccinno and Julian Martin Eisenschlos. 54. :doc:`Transformer-XL ` (from Google/CMU) released with the paper `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context `__ by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. 55. :doc:`Vision Transformer (ViT) ` (from Google AI) released with the paper `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale `__ by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. 56. :doc:`VisualBERT ` (from UCLA NLP) released with the paper `VisualBERT: A Simple and Performant Baseline for Vision and Language `__ by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. 57. :doc:`Wav2Vec2 ` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations `__ by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. 58. :doc:`XLM ` (from Facebook) released together with the paper `Cross-lingual Language Model Pretraining `__ by Guillaume Lample and Alexis Conneau. 59. :doc:`XLM-ProphetNet ` (from Microsoft Research) released with the paper `ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training `__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. 60. :doc:`XLM-RoBERTa ` (from Facebook AI), released together with the paper `Unsupervised Cross-lingual Representation Learning at Scale `__ by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. 61. :doc:`XLNet ` (from Google/CMU) released with the paper `โ€‹XLNet: Generalized Autoregressive Pretraining for Language Understanding `__ by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. 62. :doc:`XLSR-Wav2Vec2 ` (from Facebook AI) released with the paper `Unsupervised Cross-Lingual Representation Learning For Speech Recognition `__ by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. Supported frameworks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The table below represents the current support in the library for each of those models, whether they have a Python tokenizer (called "slow"). A "fast" tokenizer backed by the ๐Ÿค— Tokenizers library, whether they have support in Jax (via Flax), PyTorch, and/or TensorFlow. .. This table is updated automatically from the auto modules with `make fix-copies`. Do not update manually! .. rst-class:: center-aligned-table +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Model | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support | +=============================+================+================+=================+====================+==============+ | ALBERT | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | BART | โœ… | โœ… | โœ… | โœ… | โœ… | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | BERT | โœ… | โœ… | โœ… | โœ… | โœ… | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Bert Generation | โœ… | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | BigBird | โœ… | โœ… | โœ… | โŒ | โœ… | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | BigBirdPegasus | โŒ | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Blenderbot | โœ… | โŒ | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | BlenderbotSmall | โœ… | โŒ | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | CLIP | โœ… | โœ… | โœ… | โŒ | โœ… | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | CTRL | โœ… | โŒ | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | CamemBERT | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | ConvBERT | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | DETR | โŒ | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | DPR | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | DeBERTa | โœ… | โœ… | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | DeBERTa-v2 | โœ… | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | DeiT | โŒ | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | DistilBERT | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | ELECTRA | โœ… | โœ… | โœ… | โœ… | โœ… | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Encoder decoder | โŒ | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | FairSeq Machine-Translation | โœ… | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | FlauBERT | โœ… | โŒ | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Funnel Transformer | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | GPT Neo | โŒ | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Hubert | โŒ | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | I-BERT | โŒ | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | LED | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | LUKE | โœ… | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | LXMERT | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | LayoutLM | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Longformer | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | M2M100 | โœ… | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | MPNet | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Marian | โœ… | โŒ | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | MegatronBert | โŒ | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | MobileBERT | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | OpenAI GPT | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | OpenAI GPT-2 | โœ… | โœ… | โœ… | โœ… | โœ… | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Pegasus | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | ProphetNet | โœ… | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | RAG | โœ… | โŒ | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Reformer | โœ… | โœ… | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | RetriBERT | โœ… | โœ… | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | RoBERTa | โœ… | โœ… | โœ… | โœ… | โœ… | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | RoFormer | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Speech2Text | โœ… | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | SqueezeBERT | โœ… | โœ… | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | T5 | โœ… | โœ… | โœ… | โœ… | โœ… | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | TAPAS | โœ… | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Transformer-XL | โœ… | โŒ | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | ViT | โŒ | โŒ | โœ… | โŒ | โœ… | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | VisualBert | โŒ | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Wav2Vec2 | โœ… | โŒ | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | XLM | โœ… | โŒ | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | XLM-RoBERTa | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | XLMProphetNet | โœ… | โŒ | โœ… | โŒ | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | XLNet | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | mBART | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | mT5 | โœ… | โœ… | โœ… | โœ… | โŒ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ .. toctree:: :maxdepth: 2 :caption: Get started quicktour installation philosophy glossary .. toctree:: :maxdepth: 2 :caption: Using ๐Ÿค— Transformers task_summary model_summary preprocessing training model_sharing tokenizer_summary multilingual .. toctree:: :maxdepth: 2 :caption: Advanced guides pretrained_models examples troubleshooting custom_datasets notebooks sagemaker community converting_tensorflow_models migration contributing add_new_model fast_tokenizers performance testing debugging serialization .. toctree:: :maxdepth: 2 :caption: Research bertology perplexity benchmarks .. toctree:: :maxdepth: 2 :caption: Main Classes main_classes/callback main_classes/configuration main_classes/data_collator main_classes/logging main_classes/model main_classes/optimizer_schedules main_classes/output main_classes/pipelines main_classes/processors main_classes/tokenizer main_classes/trainer main_classes/deepspeed main_classes/feature_extractor .. toctree:: :maxdepth: 2 :caption: Models model_doc/albert model_doc/auto model_doc/bart model_doc/barthez model_doc/bert model_doc/bertweet model_doc/bertgeneration model_doc/bert_japanese model_doc/bigbird model_doc/bigbird_pegasus model_doc/blenderbot model_doc/blenderbot_small model_doc/bort model_doc/byt5 model_doc/camembert model_doc/clip model_doc/convbert model_doc/cpm model_doc/ctrl model_doc/deberta model_doc/deberta_v2 model_doc/deit model_doc/detr model_doc/dialogpt model_doc/distilbert model_doc/dpr model_doc/electra model_doc/encoderdecoder model_doc/flaubert model_doc/fsmt model_doc/funnel model_doc/herbert model_doc/ibert model_doc/layoutlm model_doc/led model_doc/longformer model_doc/luke model_doc/lxmert model_doc/marian model_doc/m2m_100 model_doc/mbart model_doc/megatron_bert model_doc/megatron_gpt2 model_doc/mobilebert model_doc/mpnet model_doc/mt5 model_doc/gpt model_doc/gpt2 model_doc/gpt_neo model_doc/hubert model_doc/pegasus model_doc/phobert model_doc/prophetnet model_doc/rag model_doc/reformer model_doc/retribert model_doc/roberta model_doc/roformer model_doc/speech_to_text model_doc/squeezebert model_doc/t5 model_doc/tapas model_doc/transformerxl model_doc/vit model_doc/visual_bert model_doc/wav2vec2 model_doc/xlm model_doc/xlmprophetnet model_doc/xlmroberta model_doc/xlnet model_doc/xlsr_wav2vec2 .. toctree:: :maxdepth: 2 :caption: Internal Helpers internal/modeling_utils internal/pipelines_utils internal/tokenization_utils internal/trainer_utils internal/generation_utils internal/file_utils