facebook
/

mms-300m

+---
+tags:
+- mms
+language:
+- ab
+- af
+- ak
+- am
+- ar
+- as
+- av
+- ay
+- az
+- ba
+- bm
+- be
+- bn
+- bi
+- bo
+- sh
+- br
+- bg
+- ca
+- cs
+- ce
+- cv
+- ku
+- cy
+- da
+- de
+- dv
+- dz
+- el
+- en
+- eo
+- et
+- eu
+- ee
+- fo
+- fa
+- fj
+- fi
+- fr
+- fy
+- ff
+- ga
+- gl
+- gn
+- gu
+- zh
+- ht
+- ha
+- he
+- hi
+- sh
+- hu
+- hy
+- ig
+- ia
+- ms
+- is
+- it
+- jv
+- ja
+- kn
+- ka
+- kk
+- kr
+- km
+- ki
+- rw
+- ky
+- ko
+- kv
+- lo
+- la
+- lv
+- ln
+- lt
+- lb
+- lg
+- mh
+- ml
+- mr
+- ms
+- mk
+- mg
+- mt
+- mn
+- mi
+- my
+- zh
+- nl
+- 'no'
+- 'no'
+- ne
+- ny
+- oc
+- om
+- or
+- os
+- pa
+- pl
+- pt
+- ms
+- ps
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- qu
+- ro
+- rn
+- ru
+- sg
+- sk
+- sl
+- sm
+- sn
+- sd
+- so
+- es
+- sq
+- su
+- sv
+- sw
+- ta
+- tt
+- te
+- tg
+- tl
+- th
+- ti
+- ts
+- tr
+- uk
+- ms
+- vi
+- wo
+- xh
+- ms
+- yo
+- ms
+- zu
+- za
+license: cc-by-sa-4.0
+datasets:
+- google/fleurs
+metrics:
+- wer
+---
+# Massively Multilingual Speech (MMS) - 300m
+Facebook's MMS counting *300m* parameters.
+MMS is Facebook AI's massive multilingual pretrained model for speech ("MMS").
+It is pretrained in with [Wav2Vec2's self-supervised training objective](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) on about 500,000 hours of speech data in over 1,400 languages.
+When using the model make sure that your speech input is sampled at 16kHz.
+**Note**: This model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Translation, or Classification. Check out the [**How-to-fine section](#how-to-finetune) or [**this blog**](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2) for more information about ASR.
+## Table Of Content
+- [How to Finetune](#how-to-finetune)
+- [Model details](#model-details)
+- [Additional links](#additional-links)
+## How to finetune
+Coming soon...
+## Model details
+- **Developed by:** Vineel Pratap et al.
+- **Model type:** Multi-Lingual Automatic Speech Recognition model
+- **Language(s):** 1000+ languages, see [supported languages](#supported-languages)
+- **License:** CC-BY-NC 4.0 license
+- **Num parameters**: 300 million
+- **Cite as:**
+      @article{pratap2023mms,
+        title={Scaling Speech Technology to 1,000+ Languages},
+        author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
+      journal={arXiv},
+      year={2023}
+      }
+## Additional Links
+- [Blog post]( )
+- [Transformers documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
+- [Paper](https://arxiv.org/abs/2305.13516)
+- [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms#asr)
+- [Other **MMS** checkpoints](https://huggingface.co/models?other=mms)
+- MMS ASR fine-tuned checkpoints:
+  - [facebook/mms-1b-all](https://huggingface.co/facebook/mms-1b-all)
+  - [facebook/mms-1b-l1107](https://huggingface.co/facebook/mms-1b-l1107)
+  - [facebook/mms-1b-fl102](https://huggingface.co/facebook/mms-1b-fl102)
+- [Official Space](https://huggingface.co/spaces/facebook/MMS)