ahassoun's picture
Upload 3018 files
ee6e328
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
โš ๏ธ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# ๐Ÿค— Transformers
[PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/), [JAX](https://jax.readthedocs.io/en/latest/)ใฎใŸใ‚ใฎๆœ€ๅ…ˆ็ซฏๆฉŸๆขฐๅญฆ็ฟ’ใ€‚
๐Ÿค— Transformers ใฏๆœ€ๅ…ˆ็ซฏใฎๅญฆ็ฟ’ๆธˆใฟใƒขใƒ‡ใƒซใ‚’็ฐกๅ˜ใซใƒ€ใ‚ฆใƒณใƒญใƒผใƒ‰ใ—ใฆๅญฆ็ฟ’ใ™ใ‚‹APIใจใƒ„ใƒผใƒซใ‚’ๆไพ›ใ—ใพใ™ใ€‚ๅญฆ็ฟ’ๆธˆใฟใƒขใƒ‡ใƒซใ‚’ไฝฟ็”จใ™ใ‚‹ใ“ใจใง่จˆ็ฎ—ใ‚ณใ‚นใƒˆใจไบŒ้…ธๅŒ–็‚ญ็ด ใฎๆŽ’ๅ‡บ้‡ใ‚’ๅ‰Šๆธ›ใงใใ€ใพใŸใ‚ผใƒญใ‹ใ‚‰ใƒขใƒ‡ใƒซใ‚’ๅญฆ็ฟ’ใ™ใ‚‹ใŸใ‚ใซ่ฆๆฑ‚ใ•ใ‚Œใ‚‹ๆ™‚้–“ใจใƒชใ‚ฝใƒผใ‚นใ‚’็ฏ€็ด„ใ™ใ‚‹ใ“ใจใŒใงใใพใ™ใ€‚ ใ“ใ‚Œใ‚‰ใฎใƒขใƒ‡ใƒซใฏไปฅไธ‹ใฎใ‚ˆใ†ใช็•ฐใชใ‚‹ใƒขใƒ€ใƒชใƒ†ใ‚ฃใซใŠใ‘ใ‚‹ไธ€่ˆฌ็š„ใชใ‚ฟใ‚นใ‚ฏใ‚’ใ‚ตใƒใƒผใƒˆใ—ใพใ™:
๐Ÿ“ **่‡ช็„ถ่จ€่ชžๅ‡ฆ็†**: ใƒ†ใ‚ญใ‚นใƒˆๅˆ†้กžใ€ ๅ›บๆœ‰่กจ็พๆŠฝๅ‡บใ€ ่ณชๅ•ๅฟœ็ญ”ใ€ ่จ€่ชžใƒขใƒ‡ใƒชใƒณใ‚ฐใ€ ๆ–‡็ซ ่ฆ็ด„ใ€ ๆฉŸๆขฐ็ฟป่จณใ€ ่ค‡ๆ•ฐ้ธๆŠžใ€ใƒ†ใ‚ญใ‚นใƒˆ็”Ÿๆˆใ€‚<br>
๐Ÿ–ผ๏ธ **ใ‚ณใƒณใƒ”ใƒฅใƒผใ‚ฟใƒ“ใ‚ธใƒงใƒณ**: ็”ปๅƒๅˆ†้กžใ€ ็‰ฉไฝ“ๆคœๅ‡บใ€ ใ‚ปใ‚ฐใƒกใƒณใƒ†ใƒผใ‚ทใƒงใƒณใ€‚<br>
๐Ÿ—ฃ๏ธ **้Ÿณๅฃฐ**: ่‡ชๅ‹•้Ÿณๅฃฐ่ช่ญ˜ใ€้Ÿณๅฃฐๅˆ†้กžใ€‚<br>
๐Ÿ™ **ใƒžใƒซใƒใƒขใƒผใƒ€ใƒซ**: ใƒ†ใƒผใƒ–ใƒซ่ณชๅ•ๅฟœ็ญ”ใ€ ๅ…‰ๅญฆๆ–‡ๅญ—่ช่ญ˜(OCR)ใ€ ใ‚นใ‚ญใƒฃใƒณใ•ใ‚ŒใŸใƒ‰ใ‚ญใƒฅใƒกใƒณใƒˆใ‹ใ‚‰ใฎๆƒ…ๅ ฑๆŠฝๅ‡บใ€ ๅ‹•็”ปๅˆ†้กžใ€ visual question answering(่ฆ–่ฆš็š„่ณชๅ•ๅฟœ็ญ”)ใ€‚
๐Ÿค— Transformers ใฏPyTorch, TensorFlow, JAX้–“ใฎใƒ•ใƒฌใƒผใƒ ใƒฏใƒผใ‚ฏ็›ธไบ’้‹็”จๆ€งใ‚’ใ‚ตใƒใƒผใƒˆใ—ใฆใ„ใพใ™ใ€‚ ใ“ใ‚Œใฏใƒขใƒ‡ใƒซใฎๅ„ๆฎต้šŽใง็•ฐใชใ‚‹ใƒ•ใƒฌใƒผใƒ ใƒฏใƒผใ‚ฏใ‚’ไฝฟใ†ใŸใ‚ใฎๆŸ”่ปŸๆ€งใ‚’ๆไพ›ใ—ใพใ™ใ€‚ใ‚ใ‚‹ใƒ•ใƒฌใƒผใƒ ใƒฏใƒผใ‚ฏใง3่กŒใฎใ‚ณใƒผใƒ‰ใงใƒขใƒ‡ใƒซใ‚’ๅญฆ็ฟ’ใ—ใ€ๅˆฅใฎใƒ•ใƒฌใƒผใƒ ใƒฏใƒผใ‚ฏใงๆŽจ่ซ–ใฎใŸใ‚ใซใƒขใƒ‡ใƒซใ‚’ใƒญใƒผใƒ‰ใ™ใ‚‹ใ“ใจใŒๅฏ่ƒฝใงใ™ใ€‚ใพใŸใ€ๆœฌ็•ช็’ฐๅขƒใฎใƒ‡ใƒ—ใƒญใ‚คใฎใŸใ‚ใซใƒขใƒ‡ใƒซใ‚’ONNXใ‚„TorchScriptใฎใ‚ˆใ†ใชๅฝขๅผใงใ‚จใ‚ฏใ‚นใƒใƒผใƒˆใ™ใ‚‹ใ“ใจใ‚‚ๅฏ่ƒฝใงใ™ใ€‚
[Hub](https://huggingface.co/models), [forum](https://discuss.huggingface.co/), [Discord](https://discord.com/invite/JfAtkvEtRb)ใงๆˆ้•ทไธญใฎใ‚ณใƒŸใƒฅใƒ‹ใƒ†ใ‚ฃใซไปŠๆ—ฅๅ‚ๅŠ ใ—ใพใ—ใ‚‡ใ†๏ผ
## Hugging Faceใƒใƒผใƒ ใซใ‚ˆใ‚‹ใ‚ซใ‚นใ‚ฟใƒ ใ‚ตใƒใƒผใƒˆใ‚’ใ”ๅธŒๆœ›ใฎๅ ดๅˆ
<a target="_blank" href="https://huggingface.co/support">
<img alt="HuggingFace Expert Acceleration Program" src="https://cdn-media.huggingface.co/marketing/transformers/new-support-improved.png" style="width: 100%; max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
</a>
## ็›ฎๆฌก
ใƒ‰ใ‚ญใƒฅใƒกใƒณใƒˆใฏไปฅไธ‹ใฎ5ใคใฎใ‚ปใ‚ฏใ‚ทใƒงใƒณใงๆง‹ๆˆใ•ใ‚Œใฆใ„ใพใ™:
- **ใฏใ˜ใ‚ใซ** ใฏใ€ใƒฉใ‚คใƒ–ใƒฉใƒชใฎใ‚ฏใ‚คใƒƒใ‚ฏใƒ„ใ‚ขใƒผใจใƒฉใ‚คใƒ–ใƒฉใƒชใ‚’ไฝฟใ„ๅง‹ใ‚ใ‚‹ใŸใ‚ใฎใ‚คใƒณใ‚นใƒˆใƒผใƒซๆ‰‹้ †ใ‚’ๆไพ›ใ—ใฆใ„ใพใ™ใ€‚
- **ใƒใƒฅใƒผใƒˆใƒชใ‚ขใƒซ** ใฏใ€ๅˆๅฟƒ่€…ใŒๅง‹ใ‚ใ‚‹ใฎใซๆœ€้ฉใชๅ ดๆ‰€ใงใ™ใ€‚ใ“ใฎใ‚ปใ‚ฏใ‚ทใƒงใƒณใงใฏใ€ใƒฉใ‚คใƒ–ใƒฉใƒชใ‚’ไฝฟใ„ๅง‹ใ‚ใ‚‹ใŸใ‚ใซๅฟ…่ฆใชๅŸบๆœฌ็š„ใชใ‚นใ‚ญใƒซใ‚’็ฟ’ๅพ—ใงใใพใ™ใ€‚
- **HOW-TOใ‚ฌใ‚คใƒ‰** ใฏใ€่จ€่ชžใƒขใƒ‡ใƒชใƒณใ‚ฐใฎใŸใ‚ใซๅญฆ็ฟ’ๆธˆใฟใƒขใƒ‡ใƒซใ‚’finetuningใ™ใ‚‹ใ“ใจใ‚„ใ‚ซใ‚นใ‚ฟใƒ ใƒขใƒ‡ใƒซใฎไฝœๆˆใจๅ…ฑๆœ‰ใฎๆ–นๆณ•ใชใฉใจใ„ใฃใŸ็‰นๅฎšใฎ็›ฎๆจ™ใ‚’้”ๆˆใ™ใ‚‹ใŸใ‚ใฎๆ–นๆณ•ใ‚’็คบใ—ใฆใ„ใพใ™ใ€‚
- **ใ‚ณใƒณใ‚ปใƒ—ใƒˆใ‚ฌใ‚คใƒ‰** ใฏใ€ใƒขใƒ‡ใƒซใ‚„ใ‚ฟใ‚นใ‚ฏใ€ใใ—ใฆ ๐Ÿค— Transformersใฎ่จญ่จˆๆ€ๆƒณใฎ่ƒŒๆ™ฏใซใ‚ใ‚‹ๅŸบๆœฌ็š„ใซใ‚ณใƒณใ‚ปใƒ—ใƒˆใ‚„่€ƒใˆๆ–นใซใคใ„ใฆใ‚ˆใ‚Šๆทฑใ่€ƒๅฏŸใ—่งฃ่ชฌใ—ใฆใ„ใพใ™ใ€‚
- **API** ๅ…จใฆใฎใ‚ฏใƒฉใ‚นใจ้–ขๆ•ฐใ‚’่ชฌๆ˜Žใ—ใพใ™:
- **MAIN CLASSES** ใฏใ€configuration, model, tokenizer, pipelineใจใ„ใฃใŸๆœ€ใ‚‚้‡่ฆใชใ‚ฏใƒฉใ‚นใซใคใ„ใฆ่ฉณ็ดฐใซ่ชฌๆ˜Žใ—ใฆใ„ใพใ™ใ€‚
- **MODELS** ใฏใ€ใƒฉใ‚คใƒ–ใƒฉใƒชใงๅฎŸ่ฃ…ใ•ใ‚Œใฆใ„ใ‚‹ใใ‚Œใžใ‚Œใฎใƒขใƒ‡ใƒซใซ้–ข้€ฃใ—ใŸใ‚ฏใƒฉใ‚นใจ้–ขๆ•ฐใ‚’่ฉณ็ดฐใซ่ชฌๆ˜Žใ—ใฆใ„ใพใ™ใ€‚
- **INTERNAL HELPERS** ใฏใ€ๅ†…้ƒจใงไฝฟ็”จใ•ใ‚Œใฆใ„ใ‚‹ใƒฆใƒผใƒ†ใ‚ฃใƒชใƒ†ใ‚ฃใ‚ฏใƒฉใ‚นใ‚„้–ขๆ•ฐใ‚’่ฉณ็ดฐใซ่ชฌๆ˜Žใ—ใฆใ„ใพใ™ใ€‚
### ใ‚ตใƒใƒผใƒˆใ•ใ‚Œใฆใ„ใ‚‹ใƒขใƒ‡ใƒซ
<!--This list is updated automatically from the README with _make fix-copies_. Do not update manually! -->
1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (Google Research and the Toyota Technological Institute at Chicago ใ‹ใ‚‰) Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942)
1. **[AltCLIP](https://huggingface.co/docs/transformers/main/model_doc/altclip)** (BAAI ใ‹ใ‚‰) Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679)
1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (MIT ใ‹ใ‚‰) Yuan Gong, Yu-An Chung, James Glass ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778)
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (Facebook ใ‹ใ‚‰) Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461)
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (ร‰cole polytechnique ใ‹ใ‚‰) Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321)
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (VinAI Research ใ‹ใ‚‰) Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701)
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (Microsoft ใ‹ใ‚‰) Hangbo Bao, Li Dong, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254)
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (Google ใ‹ใ‚‰) Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (Google ใ‹ใ‚‰) Sascha Rothe, Shashi Narayan, Aliaksei Severyn ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461)
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (VinAI Research ใ‹ใ‚‰) Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/)
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (Google Research ใ‹ใ‚‰) Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062)
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (Google Research ใ‹ใ‚‰) Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062)
1. **[BioGpt](https://huggingface.co/docs/transformers/main/model_doc/biogpt)** (Microsoft Research AI4Science ใ‹ใ‚‰) Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9)
1. **[BiT](https://huggingface.co/docs/transformers/main/model_doc/bit)** (Google AI ใ‹ใ‚‰) Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Big Transfer (BiT)](https://arxiv.org/abs/1912.11370)Houlsby.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (Facebook ใ‹ใ‚‰) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637)
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (Facebook ใ‹ใ‚‰) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637)
1. **[BLIP](https://huggingface.co/docs/transformers/main/model_doc/blip)** (Salesforce ใ‹ใ‚‰) Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086)
1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (BigScience workshop ใ‹ใ‚‰) [BigScience Workshop](https://bigscience.huggingface.co/) ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚Œใพใ—ใŸ.
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (Alexa ใ‹ใ‚‰) Adrian de Wynter and Daniel J. Perry ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499)
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (Google Research ใ‹ใ‚‰) Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626)
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (Inria/Facebook/Sorbonne ใ‹ใ‚‰) Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suรกrez*, Yoann Dupont, Laurent Romary, ร‰ric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894)
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (Google Research ใ‹ใ‚‰) Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874)
1. **[Chinese-CLIP](https://huggingface.co/docs/transformers/model_doc/chinese_clip)** (OFA-Sys ใ‹ใ‚‰) An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese](https://arxiv.org/abs/2211.01335)
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (OpenAI ใ‹ใ‚‰) Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)
1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (University of Gรถttingen ใ‹ใ‚‰) Timo Lรผddecke and Alexander Ecker ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003)
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (Salesforce ใ‹ใ‚‰) Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474)
1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (Microsoft Research Asia ใ‹ใ‚‰) Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152)
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (YituTech ใ‹ใ‚‰) Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496)
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (Facebook AI ใ‹ใ‚‰) Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545)
1. **[ConvNeXTV2](model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (Tsinghua University ใ‹ใ‚‰) Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413)
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (Salesforce ใ‹ใ‚‰) Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858)
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (Microsoft ใ‹ใ‚‰) Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808)
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (Facebook ใ‹ใ‚‰) Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555)
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (Microsoft ใ‹ใ‚‰) Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654)
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (Microsoft ใ‹ใ‚‰) Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654)
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (Berkeley/Facebook/Google ใ‹ใ‚‰) Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345)
1. **[Deformable DETR](https://huggingface.co/docs/transformers/model_doc/deformable_detr)** (SenseTime Research ใ‹ใ‚‰) Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159)
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (Facebook ใ‹ใ‚‰) Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877)
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (Facebook ใ‹ใ‚‰) Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872)
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (Microsoft Research ใ‹ใ‚‰) Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536)
1. **[DiNAT](https://huggingface.co/docs/transformers/model_doc/dinat)** (SHI Labs ใ‹ใ‚‰) Ali Hassani and Humphrey Shi ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001)
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (HuggingFace ใ‹ใ‚‰), Victor Sanh, Lysandre Debut and Thomas Wolf. ๅŒใ˜ๆ‰‹ๆณ•ใง GPT2, RoBERTa ใจ Multilingual BERT ใฎๅœง็ธฎใ‚’่กŒใ„ใพใ—ใŸ.ๅœง็ธฎใ•ใ‚ŒใŸใƒขใƒ‡ใƒซใฏใใ‚Œใžใ‚Œ [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation)ใ€[DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation)ใ€[DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) ใจๅไป˜ใ‘ใ‚‰ใ‚Œใพใ—ใŸ. ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108)
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (Microsoft Research ใ‹ใ‚‰) Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378)
1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (NAVER ใ‹ใ‚‰), Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664)
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (Facebook ใ‹ใ‚‰) Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906)
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (Intel Labs ใ‹ใ‚‰) Renรฉ Ranftl, Alexey Bochkovskiy, Vladlen Koltun ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413)
1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Research) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan and Quoc V. Le.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (Google Research/Stanford University ใ‹ใ‚‰) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555)
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (Google Research ใ‹ใ‚‰) Sascha Rothe, Shashi Narayan, Aliaksei Severyn ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461)
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu ใ‹ใ‚‰) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223)
1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (Meta AI ใ‹ใ‚‰) ใฏใƒˆใƒฉใƒณใ‚นใƒ•ใ‚ฉใƒผใƒžใƒผใƒ—ใƒญใƒ†ใ‚คใƒณ่จ€่ชžใƒขใƒ‡ใƒซใงใ™. **ESM-1b** ใฏ Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118). **ESM-1v** ใฏ Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rivesใ€€ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648). **ESM-2** ใจใ€€**ESMFold** ใฏ Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902)
1. **[FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5)** (Google AI ใ‹ใ‚‰) Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸใƒฌใƒใ‚ธใƒˆใƒชใƒผ [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) Le, and Jason Wei
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (CNRS ใ‹ใ‚‰) Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372)
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (Facebook AI ใ‹ใ‚‰) Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482)
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (Google Research ใ‹ใ‚‰) James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824)
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (CMU/Google Brain ใ‹ใ‚‰) Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236)
1. **[GIT](https://huggingface.co/docs/transformers/main/model_doc/git)** (Microsoft Research ใ‹ใ‚‰) Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [GIT: A Generative Image-to-text Transformer for Vision and Language](https://arxiv.org/abs/2205.14100)
1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (KAIST ใ‹ใ‚‰) Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436)
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (OpenAI ใ‹ใ‚‰) Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/)
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (EleutherAI ใ‹ใ‚‰) Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸใƒฌใƒใ‚ธใƒˆใƒชใƒผ : [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo)
1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (EleutherAI ใ‹ใ‚‰) Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745)
1. **[GPT NeoX Japanese](https://huggingface.co/docs/transformers/model_doc/gpt_neox_japanese)** (ABEJA ใ‹ใ‚‰) Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori ใ‹ใ‚‰ใƒชใƒชใƒผใ‚น.
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (OpenAI ใ‹ใ‚‰) Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever** ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/)
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (EleutherAI ใ‹ใ‚‰) Ben Wang and Aran Komatsuzaki ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸใƒฌใƒใ‚ธใƒˆใƒชใƒผ [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/)
1. **[GPT-Sw3](https://huggingface.co/docs/transformers/main/model_doc/gpt-sw3)** (AI-Sweden ใ‹ใ‚‰) Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey ร–hman, Fredrik Carlsson, Magnus Sahlgren ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf)
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (UCSD, NVIDIA ใ‹ใ‚‰) Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094)
1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (Facebook ใ‹ใ‚‰) Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447)
1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (Berkeley ใ‹ใ‚‰) Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321)
1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (OpenAI ใ‹ใ‚‰) Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/)
1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (OpenAI ใ‹ใ‚‰) Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf)
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (Microsoft Research Asia ใ‹ใ‚‰) Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318)
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (Microsoft Research Asia ใ‹ใ‚‰) Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740)
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (Microsoft Research Asia ใ‹ใ‚‰) Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387)
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (Microsoft Research Asia ใ‹ใ‚‰) Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836)
1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (AllenAI ใ‹ใ‚‰) Iz Beltagy, Matthew E. Peters, Arman Cohan ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150)
1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (Meta AI ใ‹ใ‚‰) Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervรฉ Jรฉgou, Matthijs Douze ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136)
1. **[LiLT](https://huggingface.co/docs/transformers/model_doc/lilt)** (South China University of Technology ใ‹ใ‚‰) Jiapeng Wang, Lianwen Jin, Kai Ding ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669)
1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (AllenAI ใ‹ใ‚‰) Iz Beltagy, Matthew E. Peters, Arman Cohan ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150)
1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (Google AI ใ‹ใ‚‰) Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916)
1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (Studio Ousia ใ‹ใ‚‰) Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057)
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (UNC Chapel Hill ใ‹ใ‚‰) Hao Tan and Mohit Bansal ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490)
1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (Facebook ใ‹ใ‚‰) Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161)
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (Facebook ใ‹ใ‚‰) Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125)
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Jรถrg Tiedemann ใ‹ใ‚‰. [OPUS](http://opus.nlpl.eu/) ใ‚’ไฝฟใ„ใชใŒใ‚‰ๅญฆ็ฟ’ใ•ใ‚ŒใŸ "Machine translation" (ใƒžใ‚ทใƒณใƒˆใƒฉใƒณใ‚นใƒฌใƒผใ‚ทใƒงใƒณ) ใƒขใƒ‡ใƒซ. [Marian Framework](https://marian-nmt.github.io/) ใฏMicrosoft Translator Teamใ€€ใŒ็พๅœจ้–‹็™บไธญใงใ™.
1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (Microsoft Research Asia ใ‹ใ‚‰) Junlong Li, Yiheng Xu, Lei Cui, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518)
1. **[Mask2Former](https://huggingface.co/docs/transformers/main/model_doc/mask2former)** (FAIR and UIUC ใ‹ใ‚‰) Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527)
1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (Meta and UIUC ใ‹ใ‚‰) Bowen Cheng, Alexander G. Schwing, Alexander Kirillov ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278)
1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (Facebook ใ‹ใ‚‰) Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210)
1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (Facebook ใ‹ใ‚‰) Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401)
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (NVIDIA ใ‹ใ‚‰) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053)
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (NVIDIA ใ‹ใ‚‰) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053)
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (Studio Ousia ใ‹ใ‚‰) Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151)
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (CMU/Google Brain ใ‹ใ‚‰) Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984)
1. **[MobileNetV1](https://huggingface.co/docs/transformers/model_doc/mobilenet_v1)** (Google Inc. ใ‹ใ‚‰) Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)
1. **[MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2)** (Google Inc. ใ‹ใ‚‰) Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381)
1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (Apple ใ‹ใ‚‰) Sachin Mehta and Mohammad Rastegari ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178)
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (Microsoft Research ใ‹ใ‚‰) Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297)
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (Google AI ใ‹ใ‚‰) Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934)
1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (RUC AI Box ใ‹ใ‚‰) Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131)
1. **[NAT](https://huggingface.co/docs/transformers/model_doc/nat)** (SHI Labs ใ‹ใ‚‰) Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143)
1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (Huawei Noahโ€™s Ark Lab ใ‹ใ‚‰) Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204)
1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (Meta ใ‹ใ‚‰) the NLLB team ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672)
1. **[Nystrรถmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (the University of Wisconsin - Madison ใ‹ใ‚‰) Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Nystrรถmformer: A Nystrรถm-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902)
1. **[OneFormer](https://huggingface.co/docs/transformers/main/model_doc/oneformer)** (SHI Labs ใ‹ใ‚‰) Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [OneFormer: One Transformer to Rule Universal Image Segmentation](https://arxiv.org/abs/2211.06220)
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (Meta AI ใ‹ใ‚‰) Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068)
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (Google AI ใ‹ใ‚‰) Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230)
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (Google ใ‹ใ‚‰) Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)
1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (Google ใ‹ใ‚‰) Jason Phang, Yao Zhao, and Peter J. Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347)
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (Deepmind ใ‹ใ‚‰) Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hรฉnaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joรฃo Carreira ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795)
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (VinAI Research ใ‹ใ‚‰) Dat Quoc Nguyen and Anh Tuan Nguyen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/)
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (UCLA NLP ใ‹ใ‚‰) Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333)
1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (Sea AI Labs ใ‹ใ‚‰) Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418)
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (Microsoft Research ใ‹ใ‚‰) Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063)
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (NVIDIA ใ‹ใ‚‰) Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602)
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (Facebook ใ‹ใ‚‰) Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kรผttler, Mike Lewis, Wen-tau Yih, Tim Rocktรคschel, Sebastian Riedel, Douwe Kiela ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401)
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (Google Research ใ‹ใ‚‰) Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909)
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (Google Research ใ‹ใ‚‰) Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451)
1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (META Platforms ใ‹ใ‚‰) Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollรกr ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Designing Network Design Space](https://arxiv.org/abs/2003.13678)
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (Google Research ใ‹ใ‚‰) Hyung Won Chung, Thibault Fรฉvry, Henry Tsai, M. Johnson, Sebastian Ruder ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821)
1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (Microsoft Research ใ‹ใ‚‰) Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (Facebook ใ‹ใ‚‰), Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)
1. **[RoBERTa-PreLayerNorm](https://huggingface.co/docs/transformers/main/model_doc/roberta-prelayernorm)** (Facebook ใ‹ใ‚‰) Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [fairseq: A Fast, Extensible Toolkit for Sequence Modeling](https://arxiv.org/abs/1904.01038)
1. **[RoCBert](https://huggingface.co/docs/transformers/main/model_doc/roc_bert)** (WeChatAI ใ‹ใ‚‰) HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf)
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (ZhuiyiTechnology ใ‹ใ‚‰), Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864)
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (NVIDIA ใ‹ใ‚‰) Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203)
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (ASAPP ใ‹ใ‚‰) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870)
1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (ASAPP ใ‹ใ‚‰) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870)
1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (Facebook ใ‹ใ‚‰), Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171)
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (Facebook ใ‹ใ‚‰), Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678)
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (Tel Aviv University ใ‹ใ‚‰), Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438)
1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (Berkeley ใ‹ใ‚‰) Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316)
1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (Microsoft ใ‹ใ‚‰) Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030)
1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (Microsoft ใ‹ใ‚‰) Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883)
1. **[Swin2SR](https://huggingface.co/docs/transformers/main/model_doc/swin2sr)** (University of Wรผrzburg ใ‹ใ‚‰) Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345)
1. **[SwitchTransformers](https://huggingface.co/docs/transformers/main/model_doc/switch_transformers)** (Google ใ‹ใ‚‰) William Fedus, Barret Zoph, Noam Shazeer ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961)
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (Google AI ใ‹ใ‚‰) Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683)
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (Google AI ใ‹ใ‚‰) Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸใƒฌใƒใ‚ธใƒˆใƒชใƒผ [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511)
1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (Microsoft Research ใ‹ใ‚‰) Brandon Smock, Rohith Pesala, Robin Abraham ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061)
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (Google AI ใ‹ใ‚‰) Jonathan Herzig, Paweล‚ Krzysztof Nowak, Thomas Mรผller, Francesco Piccinno and Julian Martin Eisenschlos ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349)
1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (Microsoft Research ใ‹ใ‚‰) Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653)
1. **[Time Series Transformer](https://huggingface.co/docs/transformers/model_doc/time_series_transformer)** (HuggingFace ใ‹ใ‚‰).
1. **[TimeSformer](https://huggingface.co/docs/transformers/main/model_doc/timesformer)** (Facebook ใ‹ใ‚‰) Gedas Bertasius, Heng Wang, Lorenzo Torresani ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Is Space-Time Attention All You Need for Video Understanding?](https://arxiv.org/abs/2102.05095)
1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (the University of California at Berkeley ใ‹ใ‚‰) Michael Janner, Qiyang Li, Sergey Levine ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039)
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (Google/CMU ใ‹ใ‚‰) Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860)
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (Microsoft ใ‹ใ‚‰), Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282)
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (Google Research ใ‹ใ‚‰) Yi Tay, Mostafa Dehghani, Vinh Q ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (Microsoft Research ใ‹ใ‚‰) Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597)
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (Microsoft Research ใ‹ใ‚‰) Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752)
1. **[UPerNet](https://huggingface.co/docs/transformers/main/model_doc/upernet)** (Peking University ใ‹ใ‚‰) Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221)
1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (Tsinghua University and Nankai University ใ‹ใ‚‰) Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Visual Attention Network](https://arxiv.org/abs/2202.09741)
1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (Multimedia Computing Group, Nanjing University ใ‹ใ‚‰) Zhan Tong, Yibing Song, Jue Wang, Limin Wang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602)
1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (NAVER AI Lab/Kakao Enterprise/Kakao Brain ใ‹ใ‚‰) Wonjae Kim, Bokyung Son, Ildoo Kim ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334)
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (Google AI ใ‹ใ‚‰) Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (UCLA NLP ใ‹ใ‚‰) Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557)
1. **[ViT Hybrid](https://huggingface.co/docs/transformers/main/model_doc/vit_hybrid)** (Google AI ใ‹ใ‚‰) Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)
1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (Meta AI ใ‹ใ‚‰) Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollรกr, Ross Girshick ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377)
1. **[ViTMSN](https://huggingface.co/docs/transformers/model_doc/vit_msn)** (Meta AI ใ‹ใ‚‰) Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141)
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (Facebook AI ใ‹ใ‚‰) Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477)
1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (Facebook AI ใ‹ใ‚‰) Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171)
1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (Facebook AI ใ‹ใ‚‰) Qiantong Xu, Alexei Baevski, Michael Auli ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680)
1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (Microsoft Research ใ‹ใ‚‰) Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900)
1. **[Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)** (OpenAI ใ‹ใ‚‰) Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf)
1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (Microsoft Research ใ‹ใ‚‰) Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816)
1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668)
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (Facebook ใ‹ใ‚‰) Guillaume Lample and Alexis Conneau ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291)
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (Microsoft Research ใ‹ใ‚‰) Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063)
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (Facebook AI ใ‹ใ‚‰), Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116)
1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (Facebook AI ใ‹ใ‚‰), Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572)
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (Google/CMU ใ‹ใ‚‰) Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [โ€‹XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237)
1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (Facebook AI ใ‹ใ‚‰) Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296)
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (Facebook AI ใ‹ใ‚‰) Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979)
1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (Huazhong University of Science & Technology ใ‹ใ‚‰) Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666)
1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (the University of Wisconsin - Madison ใ‹ใ‚‰) Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714)
### ใ‚ตใƒใƒผใƒˆใ•ใ‚Œใฆใ„ใ‚‹ใƒ•ใƒฌใƒผใƒ ใƒฏใƒผใ‚ฏ
ไปฅไธ‹ใฎใƒ†ใƒผใƒ–ใƒซใฏใใ‚Œใžใ‚Œใฎใƒขใƒ‡ใƒซใงใ‚ตใƒใƒผใƒˆใ•ใ‚Œใฆใ„ใ‚‹ใƒฉใ‚คใƒ–ใƒฉใƒชใ‚’็คบใ—ใฆใ„ใพใ™ใ€‚"slow"ใจๅ‘ผใฐใ‚Œใ‚‹Pythonใƒˆใƒผใ‚ฏใƒŠใ‚คใ‚ถใƒผใ€๐Ÿค— Tokenizers ใƒฉใ‚คใƒ–ใƒฉใƒชใซใ‚ˆใ‚‹"fast"ใƒˆใƒผใ‚ฏใƒŠใ‚คใ‚ถใƒผใ€PyTorch, TensorFlow, Flaxใฎ5ใคใฎใใ‚Œใžใ‚ŒใŒใ‚ตใƒใƒผใƒˆใ•ใ‚Œใฆใ„ใ‚‹ใ‹ใ‚’็คบใ—ใฆใ„ใพใ™ใ€‚
<!--This table is updated automatically from the auto modules with _make fix-copies_. Do not update manually!-->
| Model | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support |
|:-----------------------------:|:--------------:|:--------------:|:---------------:|:------------------:|:------------:|
| ALBERT | โœ… | โœ… | โœ… | โœ… | โœ… |
| AltCLIP | โŒ | โŒ | โœ… | โŒ | โŒ |
| Audio Spectrogram Transformer | โŒ | โŒ | โœ… | โŒ | โŒ |
| BART | โœ… | โœ… | โœ… | โœ… | โœ… |
| BEiT | โŒ | โŒ | โœ… | โŒ | โœ… |
| BERT | โœ… | โœ… | โœ… | โœ… | โœ… |
| Bert Generation | โœ… | โŒ | โœ… | โŒ | โŒ |
| BigBird | โœ… | โœ… | โœ… | โŒ | โœ… |
| BigBird-Pegasus | โŒ | โŒ | โœ… | โŒ | โŒ |
| BioGpt | โœ… | โŒ | โœ… | โŒ | โŒ |
| BiT | โŒ | โŒ | โœ… | โŒ | โŒ |
| Blenderbot | โœ… | โœ… | โœ… | โœ… | โœ… |
| BlenderbotSmall | โœ… | โœ… | โœ… | โœ… | โœ… |
| BLIP | โŒ | โŒ | โœ… | โŒ | โŒ |
| BLOOM | โŒ | โœ… | โœ… | โŒ | โŒ |
| CamemBERT | โœ… | โœ… | โœ… | โœ… | โŒ |
| CANINE | โœ… | โŒ | โœ… | โŒ | โŒ |
| Chinese-CLIP | โŒ | โŒ | โœ… | โŒ | โŒ |
| CLIP | โœ… | โœ… | โœ… | โœ… | โœ… |
| CLIPSeg | โŒ | โŒ | โœ… | โŒ | โŒ |
| CodeGen | โœ… | โœ… | โœ… | โŒ | โŒ |
| Conditional DETR | โŒ | โŒ | โœ… | โŒ | โŒ |
| ConvBERT | โœ… | โœ… | โœ… | โœ… | โŒ |
| ConvNeXT | โŒ | โŒ | โœ… | โœ… | โŒ |
| CTRL | โœ… | โŒ | โœ… | โœ… | โŒ |
| CvT | โŒ | โŒ | โœ… | โœ… | โŒ |
| Data2VecAudio | โŒ | โŒ | โœ… | โŒ | โŒ |
| Data2VecText | โŒ | โŒ | โœ… | โŒ | โŒ |
| Data2VecVision | โŒ | โŒ | โœ… | โœ… | โŒ |
| DeBERTa | โœ… | โœ… | โœ… | โœ… | โŒ |
| DeBERTa-v2 | โœ… | โœ… | โœ… | โœ… | โŒ |
| Decision Transformer | โŒ | โŒ | โœ… | โŒ | โŒ |
| Deformable DETR | โŒ | โŒ | โœ… | โŒ | โŒ |
| DeiT | โŒ | โŒ | โœ… | โœ… | โŒ |
| DETR | โŒ | โŒ | โœ… | โŒ | โŒ |
| DiNAT | โŒ | โŒ | โœ… | โŒ | โŒ |
| DistilBERT | โœ… | โœ… | โœ… | โœ… | โœ… |
| DonutSwin | โŒ | โŒ | โœ… | โŒ | โŒ |
| DPR | โœ… | โœ… | โœ… | โœ… | โŒ |
| DPT | โŒ | โŒ | โœ… | โŒ | โŒ |
| ELECTRA | โœ… | โœ… | โœ… | โœ… | โœ… |
| Encoder decoder | โŒ | โŒ | โœ… | โœ… | โœ… |
| ERNIE | โŒ | โŒ | โœ… | โŒ | โŒ |
| ESM | โœ… | โŒ | โœ… | โœ… | โŒ |
| FairSeq Machine-Translation | โœ… | โŒ | โœ… | โŒ | โŒ |
| FlauBERT | โœ… | โŒ | โœ… | โœ… | โŒ |
| FLAVA | โŒ | โŒ | โœ… | โŒ | โŒ |
| FNet | โœ… | โœ… | โœ… | โŒ | โŒ |
| Funnel Transformer | โœ… | โœ… | โœ… | โœ… | โŒ |
| GIT | โŒ | โŒ | โœ… | โŒ | โŒ |
| GLPN | โŒ | โŒ | โœ… | โŒ | โŒ |
| GPT Neo | โŒ | โŒ | โœ… | โŒ | โœ… |
| GPT NeoX | โŒ | โœ… | โœ… | โŒ | โŒ |
| GPT NeoX Japanese | โœ… | โŒ | โœ… | โŒ | โŒ |
| GPT-J | โŒ | โŒ | โœ… | โœ… | โœ… |
| GPT-Sw3 | โœ… | โœ… | โœ… | โœ… | โœ… |
| GroupViT | โŒ | โŒ | โœ… | โœ… | โŒ |
| Hubert | โŒ | โŒ | โœ… | โœ… | โŒ |
| I-BERT | โŒ | โŒ | โœ… | โŒ | โŒ |
| ImageGPT | โŒ | โŒ | โœ… | โŒ | โŒ |
| Jukebox | โœ… | โŒ | โœ… | โŒ | โŒ |
| LayoutLM | โœ… | โœ… | โœ… | โœ… | โŒ |
| LayoutLMv2 | โœ… | โœ… | โœ… | โŒ | โŒ |
| LayoutLMv3 | โœ… | โœ… | โœ… | โœ… | โŒ |
| LED | โœ… | โœ… | โœ… | โœ… | โŒ |
| LeViT | โŒ | โŒ | โœ… | โŒ | โŒ |
| LiLT | โŒ | โŒ | โœ… | โŒ | โŒ |
| Longformer | โœ… | โœ… | โœ… | โœ… | โŒ |
| LongT5 | โŒ | โŒ | โœ… | โŒ | โœ… |
| LUKE | โœ… | โŒ | โœ… | โŒ | โŒ |
| LXMERT | โœ… | โœ… | โœ… | โœ… | โŒ |
| M-CTC-T | โŒ | โŒ | โœ… | โŒ | โŒ |
| M2M100 | โœ… | โŒ | โœ… | โŒ | โŒ |
| Marian | โœ… | โŒ | โœ… | โœ… | โœ… |
| MarkupLM | โœ… | โœ… | โœ… | โŒ | โŒ |
| Mask2Former | โŒ | โŒ | โœ… | โŒ | โŒ |
| MaskFormer | โŒ | โŒ | โœ… | โŒ | โŒ |
| MaskFormerSwin | โŒ | โŒ | โŒ | โŒ | โŒ |
| mBART | โœ… | โœ… | โœ… | โœ… | โœ… |
| Megatron-BERT | โŒ | โŒ | โœ… | โŒ | โŒ |
| MobileBERT | โœ… | โœ… | โœ… | โœ… | โŒ |
| MobileNetV1 | โŒ | โŒ | โœ… | โŒ | โŒ |
| MobileNetV2 | โŒ | โŒ | โœ… | โŒ | โŒ |
| MobileViT | โŒ | โŒ | โœ… | โœ… | โŒ |
| MPNet | โœ… | โœ… | โœ… | โœ… | โŒ |
| MT5 | โœ… | โœ… | โœ… | โœ… | โœ… |
| MVP | โœ… | โœ… | โœ… | โŒ | โŒ |
| NAT | โŒ | โŒ | โœ… | โŒ | โŒ |
| Nezha | โŒ | โŒ | โœ… | โŒ | โŒ |
| Nystrรถmformer | โŒ | โŒ | โœ… | โŒ | โŒ |
| OpenAI GPT | โœ… | โœ… | โœ… | โœ… | โŒ |
| OpenAI GPT-2 | โœ… | โœ… | โœ… | โœ… | โœ… |
| OPT | โŒ | โŒ | โœ… | โœ… | โœ… |
| OWL-ViT | โŒ | โŒ | โœ… | โŒ | โŒ |
| Pegasus | โœ… | โœ… | โœ… | โœ… | โœ… |
| PEGASUS-X | โŒ | โŒ | โœ… | โŒ | โŒ |
| Perceiver | โœ… | โŒ | โœ… | โŒ | โŒ |
| PLBart | โœ… | โŒ | โœ… | โŒ | โŒ |
| PoolFormer | โŒ | โŒ | โœ… | โŒ | โŒ |
| ProphetNet | โœ… | โŒ | โœ… | โŒ | โŒ |
| QDQBert | โŒ | โŒ | โœ… | โŒ | โŒ |
| RAG | โœ… | โŒ | โœ… | โœ… | โŒ |
| REALM | โœ… | โœ… | โœ… | โŒ | โŒ |
| Reformer | โœ… | โœ… | โœ… | โŒ | โŒ |
| RegNet | โŒ | โŒ | โœ… | โœ… | โœ… |
| RemBERT | โœ… | โœ… | โœ… | โœ… | โŒ |
| ResNet | โŒ | โŒ | โœ… | โœ… | โœ… |
| RetriBERT | โœ… | โœ… | โœ… | โŒ | โŒ |
| RoBERTa | โœ… | โœ… | โœ… | โœ… | โœ… |
| RoBERTa-PreLayerNorm | โŒ | โŒ | โœ… | โœ… | โœ… |
| RoCBert | โœ… | โŒ | โœ… | โŒ | โŒ |
| RoFormer | โœ… | โœ… | โœ… | โœ… | โœ… |
| SegFormer | โŒ | โŒ | โœ… | โœ… | โŒ |
| SEW | โŒ | โŒ | โœ… | โŒ | โŒ |
| SEW-D | โŒ | โŒ | โœ… | โŒ | โŒ |
| Speech Encoder decoder | โŒ | โŒ | โœ… | โŒ | โœ… |
| Speech2Text | โœ… | โŒ | โœ… | โœ… | โŒ |
| Speech2Text2 | โœ… | โŒ | โŒ | โŒ | โŒ |
| Splinter | โœ… | โœ… | โœ… | โŒ | โŒ |
| SqueezeBERT | โœ… | โœ… | โœ… | โŒ | โŒ |
| Swin Transformer | โŒ | โŒ | โœ… | โœ… | โŒ |
| Swin Transformer V2 | โŒ | โŒ | โœ… | โŒ | โŒ |
| Swin2SR | โŒ | โŒ | โœ… | โŒ | โŒ |
| SwitchTransformers | โŒ | โŒ | โœ… | โŒ | โŒ |
| T5 | โœ… | โœ… | โœ… | โœ… | โœ… |
| Table Transformer | โŒ | โŒ | โœ… | โŒ | โŒ |
| TAPAS | โœ… | โŒ | โœ… | โœ… | โŒ |
| Time Series Transformer | โŒ | โŒ | โœ… | โŒ | โŒ |
| TimeSformer | โŒ | โŒ | โœ… | โŒ | โŒ |
| Trajectory Transformer | โŒ | โŒ | โœ… | โŒ | โŒ |
| Transformer-XL | โœ… | โŒ | โœ… | โœ… | โŒ |
| TrOCR | โŒ | โŒ | โœ… | โŒ | โŒ |
| UniSpeech | โŒ | โŒ | โœ… | โŒ | โŒ |
| UniSpeechSat | โŒ | โŒ | โœ… | โŒ | โŒ |
| UPerNet | โŒ | โŒ | โœ… | โŒ | โŒ |
| VAN | โŒ | โŒ | โœ… | โŒ | โŒ |
| VideoMAE | โŒ | โŒ | โœ… | โŒ | โŒ |
| ViLT | โŒ | โŒ | โœ… | โŒ | โŒ |
| Vision Encoder decoder | โŒ | โŒ | โœ… | โœ… | โœ… |
| VisionTextDualEncoder | โŒ | โŒ | โœ… | โŒ | โœ… |
| VisualBERT | โŒ | โŒ | โœ… | โŒ | โŒ |
| ViT | โŒ | โŒ | โœ… | โœ… | โœ… |
| ViT Hybrid | โŒ | โŒ | โœ… | โŒ | โŒ |
| ViTMAE | โŒ | โŒ | โœ… | โœ… | โŒ |
| ViTMSN | โŒ | โŒ | โœ… | โŒ | โŒ |
| Wav2Vec2 | โœ… | โŒ | โœ… | โœ… | โœ… |
| Wav2Vec2-Conformer | โŒ | โŒ | โœ… | โŒ | โŒ |
| WavLM | โŒ | โŒ | โœ… | โŒ | โŒ |
| Whisper | โœ… | โŒ | โœ… | โœ… | โŒ |
| X-CLIP | โŒ | โŒ | โœ… | โŒ | โŒ |
| XGLM | โœ… | โœ… | โœ… | โœ… | โœ… |
| XLM | โœ… | โŒ | โœ… | โœ… | โŒ |
| XLM-ProphetNet | โœ… | โŒ | โœ… | โŒ | โŒ |
| XLM-RoBERTa | โœ… | โœ… | โœ… | โœ… | โœ… |
| XLM-RoBERTa-XL | โŒ | โŒ | โœ… | โŒ | โŒ |
| XLNet | โœ… | โœ… | โœ… | โœ… | โŒ |
| YOLOS | โŒ | โŒ | โœ… | โŒ | โŒ |
| YOSO | โŒ | โŒ | โœ… | โŒ | โŒ |
<!-- End table-->