facebook
/

mms-cclms

Model card Files Files and versions Community

mms-cclms / README.md

vineelpratap's picture

Update README.md

92a7c8a over 1 year ago

|

3.13 kB

	---
	license: cc-by-nc-4.0
	tags:
	- mms
	---

	# Massively Multilingual Speech (MMS) - Common Crawl Language Models

	This repository consists of the n-gram language models trained on Common Crawl data ([Conneau et al. 2020b](https://aclanthology.org/2020.acl-main.747/), [NLLB_Team et al. 2022](https://arxiv.org/abs/2207.04672)) using [KenLM library](https://github.com/kpu/kenlm).

	## Table Of Content

	- [Example](#example)
	- [Supported Languages](#supported-languages)
	- [Model details](#model-details)
	- [Additional links](#additional-links)

	## Example

	```py

	TODO
	```

	## Supported Languages

	We support language models in 102 languages. Unclick the following to toogle all supported languages of this checkpoint in [ISO 639-3 code](https://en.wikipedia.org/wiki/ISO_639-3).
	You can find more details about the languages and their ISO 639-3 codes in the [MMS Language Coverage Overview](https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html).
	<details>
	<summary>Click to toggle</summary>

	- afr
	- amh
	- ara
	- asm
	- ast
	- azj
	- bel
	- ben
	- bos
	- bul
	- cat
	- ceb
	- ces
	- ckb
	- cmn
	- cym
	- dan
	- deu
	- ell
	- eng
	- est
	- fas
	- fin
	- fra
	- ful
	- gle
	- glg
	- guj
	- hau
	- heb
	- hin
	- hrv
	- hun
	- hye
	- ibo
	- ind
	- isl
	- ita
	- jav
	- jpn
	- kam
	- kan
	- kat
	- kaz
	- kea
	- khm
	- kir
	- kor
	- lao
	- lav
	- lin
	- lit
	- ltz
	- lug
	- luo
	- mal
	- mar
	- mkd
	- mlt
	- mon
	- mri
	- mya
	- nld
	- nob
	- npi
	- nso
	- nya
	- oci
	- orm
	- ory
	- pan
	- pol
	- por
	- pus
	- ron
	- rus
	- slk
	- slv
	- sna
	- snd
	- som
	- spa
	- srp
	- swe
	- swh
	- tam
	- tel
	- tgk
	- tgl
	- tha
	- tur
	- ukr
	- umb
	- urd
	- uzb
	- vie
	- wol
	- xho
	- yor
	- yue
	- zlm
	- zul
	</details>

	## Model details

	- Developed by: Vineel Pratap et al.
	- Model type: Multi-Lingual Automatic Speech Recognition model
	- Language(s): 126 languages, see [supported languages](#supported-languages)
	- License: CC-BY-NC 4.0 license
	- Num parameters: 1 billion
	- Audio sampling rate: 16,000 kHz
	- Cite as:

	@article{pratap2023mms,
	title={Scaling Speech Technology to 1,000+ Languages},
	author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
	journal={arXiv},
	year={2023}
	}

	## Additional Links

	- [Blog post](https://ai.facebook.com/blog/multilingual-model-speech-recognition/)
	- [Transformers documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
	- [Paper](https://arxiv.org/abs/2305.13516)
	- [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms#asr)
	- [Other MMS checkpoints](https://huggingface.co/models?other=mms)
	- MMS base checkpoints:
	- [facebook/mms-1b](https://huggingface.co/facebook/mms-1b)
	- [facebook/mms-300m](https://huggingface.co/facebook/mms-300m)
	- [Official Space](https://huggingface.co/spaces/facebook/MMS)