mms
mms-cclms / README.md
vineelpratap's picture
Update README.md
92a7c8a
|
raw
history blame
3.13 kB
---
license: cc-by-nc-4.0
tags:
- mms
---
# Massively Multilingual Speech (MMS) - Common Crawl Language Models
This repository consists of the n-gram language models trained on Common Crawl data ([Conneau et al. 2020b](https://aclanthology.org/2020.acl-main.747/), [NLLB_Team et al. 2022](https://arxiv.org/abs/2207.04672)) using [KenLM library](https://github.com/kpu/kenlm).
## Table Of Content
- [Example](#example)
- [Supported Languages](#supported-languages)
- [Model details](#model-details)
- [Additional links](#additional-links)
## Example
```py
TODO
```
## Supported Languages
We support language models in 102 languages. Unclick the following to toogle all supported languages of this checkpoint in [ISO 639-3 code](https://en.wikipedia.org/wiki/ISO_639-3).
You can find more details about the languages and their ISO 639-3 codes in the [MMS Language Coverage Overview](https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html).
<details>
<summary>Click to toggle</summary>
- afr
- amh
- ara
- asm
- ast
- azj
- bel
- ben
- bos
- bul
- cat
- ceb
- ces
- ckb
- cmn
- cym
- dan
- deu
- ell
- eng
- est
- fas
- fin
- fra
- ful
- gle
- glg
- guj
- hau
- heb
- hin
- hrv
- hun
- hye
- ibo
- ind
- isl
- ita
- jav
- jpn
- kam
- kan
- kat
- kaz
- kea
- khm
- kir
- kor
- lao
- lav
- lin
- lit
- ltz
- lug
- luo
- mal
- mar
- mkd
- mlt
- mon
- mri
- mya
- nld
- nob
- npi
- nso
- nya
- oci
- orm
- ory
- pan
- pol
- por
- pus
- ron
- rus
- slk
- slv
- sna
- snd
- som
- spa
- srp
- swe
- swh
- tam
- tel
- tgk
- tgl
- tha
- tur
- ukr
- umb
- urd
- uzb
- vie
- wol
- xho
- yor
- yue
- zlm
- zul
</details>
## Model details
- **Developed by:** Vineel Pratap et al.
- **Model type:** Multi-Lingual Automatic Speech Recognition model
- **Language(s):** 126 languages, see [supported languages](#supported-languages)
- **License:** CC-BY-NC 4.0 license
- **Num parameters**: 1 billion
- **Audio sampling rate**: 16,000 kHz
- **Cite as:**
@article{pratap2023mms,
title={Scaling Speech Technology to 1,000+ Languages},
author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
journal={arXiv},
year={2023}
}
## Additional Links
- [Blog post](https://ai.facebook.com/blog/multilingual-model-speech-recognition/)
- [Transformers documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
- [Paper](https://arxiv.org/abs/2305.13516)
- [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms#asr)
- [Other **MMS** checkpoints](https://huggingface.co/models?other=mms)
- MMS base checkpoints:
- [facebook/mms-1b](https://huggingface.co/facebook/mms-1b)
- [facebook/mms-300m](https://huggingface.co/facebook/mms-300m)
- [Official Space](https://huggingface.co/spaces/facebook/MMS)