mms-1b-all / README.md
vineelpratap's picture
Update README.md
3d33597
---
tags:
- mms
language:
- ab
- af
- ak
- am
- ar
- as
- av
- ay
- az
- ba
- bm
- be
- bn
- bi
- bo
- sh
- br
- bg
- ca
- cs
- ce
- cv
- ku
- cy
- da
- de
- dv
- dz
- el
- en
- eo
- et
- eu
- ee
- fo
- fa
- fj
- fi
- fr
- fy
- ff
- ga
- gl
- gn
- gu
- zh
- ht
- ha
- he
- hi
- sh
- hu
- hy
- ig
- ia
- ms
- is
- it
- jv
- ja
- kn
- ka
- kk
- kr
- km
- ki
- rw
- ky
- ko
- kv
- lo
- la
- lv
- ln
- lt
- lb
- lg
- mh
- ml
- mr
- ms
- mk
- mg
- mt
- mn
- mi
- my
- zh
- nl
- 'no'
- 'no'
- ne
- ny
- oc
- om
- or
- os
- pa
- pl
- pt
- ms
- ps
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- ro
- rn
- ru
- sg
- sk
- sl
- sm
- sn
- sd
- so
- es
- sq
- su
- sv
- sw
- ta
- tt
- te
- tg
- tl
- th
- ti
- ts
- tr
- uk
- ms
- vi
- wo
- xh
- ms
- yo
- ms
- zu
- za
license: cc-by-nc-4.0
datasets:
- google/fleurs
metrics:
- wer
---
# Massively Multilingual Speech (MMS) - Finetuned ASR - ALL
This checkpoint is a model fine-tuned for multi-lingual ASR and part of Facebook's [Massive Multilingual Speech project](https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/).
This checkpoint is based on the [Wav2Vec2 architecture](https://huggingface.co/docs/transformers/model_doc/wav2vec2) and makes use of adapter models to transcribe 1000+ languages.
The checkpoint consists of **1 billion parameters** and has been fine-tuned from [facebook/mms-1b](https://huggingface.co/facebook/mms-1b) on 1162 languages.
## Table Of Content
- [Example](#example)
- [Supported Languages](#supported-languages)
- [Model details](#model-details)
- [Additional links](#additional-links)
## Example
This MMS checkpoint can be used with [Transformers](https://github.com/huggingface/transformers) to transcribe audio of 1107 different
languages. Let's look at a simple example.
First, we install transformers and some other libraries
```
pip install torch accelerate torchaudio datasets
pip install --upgrade transformers
````
**Note**: In order to use MMS you need to have at least `transformers >= 4.30` installed. If the `4.30` version
is not yet available [on PyPI](https://pypi.org/project/transformers/) make sure to install `transformers` from
source:
```
pip install git+https://github.com/huggingface/transformers.git
```
Next, we load a couple of audio samples via `datasets`. Make sure that the audio data is sampled to 16000 kHz.
```py
from datasets import load_dataset, Audio
# English
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]
# French
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "fr", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
fr_sample = next(iter(stream_data))["audio"]["array"]
```
Next, we load the model and processor
```py
from transformers import Wav2Vec2ForCTC, AutoProcessor
import torch
model_id = "facebook/mms-1b-all"
processor = AutoProcessor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)
```
Now we process the audio data, pass the processed audio data to the model and transcribe the model output, just like we usually do for Wav2Vec2 models such as [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h)
```py
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
# 'joe keton disapproved of films and buster also had reservations about the media'
```
We can now keep the same model in memory and simply switch out the language adapters by calling the convenient [`load_adapter()`]() function for the model and [`set_target_lang()`]() for the tokenizer. We pass the target language as an input - "fra" for French.
```py
processor.tokenizer.set_target_lang("fra")
model.load_adapter("fra")
inputs = processor(fr_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
# "ce dernier est volé tout au long de l'histoire romaine"
```
In the same way the language can be switched out for all other supported languages. Please have a look at:
```py
processor.tokenizer.vocab.keys()
```
For more details, please have a look at [the official docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
## Supported Languages
This model supports 1162 languages. Unclick the following to toogle all supported languages of this checkpoint in [ISO 639-3 code](https://en.wikipedia.org/wiki/ISO_639-3).
You can find more details about the languages and their ISO 649-3 codes in the [MMS Language Coverage Overview](https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html).
<details>
<summary>Click to toggle</summary>
- abi
- abk
- abp
- aca
- acd
- ace
- acf
- ach
- acn
- acr
- acu
- ade
- adh
- adj
- adx
- aeu
- afr
- agd
- agg
- agn
- agr
- agu
- agx
- aha
- ahk
- aia
- aka
- akb
- ake
- akp
- alj
- alp
- alt
- alz
- ame
- amf
- amh
- ami
- amk
- ann
- any
- aoz
- apb
- apr
- ara
- arl
- asa
- asg
- asm
- ast
- ata
- atb
- atg
- ati
- atq
- ava
- avn
- avu
- awa
- awb
- ayo
- ayr
- ayz
- azb
- azg
- azj-script_cyrillic
- azj-script_latin
- azz
- bak
- bam
- ban
- bao
- bas
- bav
- bba
- bbb
- bbc
- bbo
- bcc-script_arabic
- bcc-script_latin
- bcl
- bcw
- bdg
- bdh
- bdq
- bdu
- bdv
- beh
- bel
- bem
- ben
- bep
- bex
- bfa
- bfo
- bfy
- bfz
- bgc
- bgq
- bgr
- bgt
- bgw
- bha
- bht
- bhz
- bib
- bim
- bis
- biv
- bjr
- bjv
- bjw
- bjz
- bkd
- bkv
- blh
- blt
- blx
- blz
- bmq
- bmr
- bmu
- bmv
- bng
- bno
- bnp
- boa
- bod
- boj
- bom
- bor
- bos
- bov
- box
- bpr
- bps
- bqc
- bqi
- bqj
- bqp
- bre
- bru
- bsc
- bsq
- bss
- btd
- bts
- btt
- btx
- bud
- bul
- bus
- bvc
- bvz
- bwq
- bwu
- byr
- bzh
- bzi
- bzj
- caa
- cab
- cac-dialect_sanmateoixtatan
- cac-dialect_sansebastiancoatan
- cak-dialect_central
- cak-dialect_santamariadejesus
- cak-dialect_santodomingoxenacoj
- cak-dialect_southcentral
- cak-dialect_western
- cak-dialect_yepocapa
- cap
- car
- cas
- cat
- cax
- cbc
- cbi
- cbr
- cbs
- cbt
- cbu
- cbv
- cce
- cco
- cdj
- ceb
- ceg
- cek
- ces
- cfm
- cgc
- che
- chf
- chv
- chz
- cjo
- cjp
- cjs
- ckb
- cko
- ckt
- cla
- cle
- cly
- cme
- cmn-script_simplified
- cmo-script_khmer
- cmo-script_latin
- cmr
- cnh
- cni
- cnl
- cnt
- coe
- cof
- cok
- con
- cot
- cou
- cpa
- cpb
- cpu
- crh
- crk-script_latin
- crk-script_syllabics
- crn
- crq
- crs
- crt
- csk
- cso
- ctd
- ctg
- cto
- ctu
- cuc
- cui
- cuk
- cul
- cwa
- cwe
- cwt
- cya
- cym
- daa
- dah
- dan
- dar
- dbj
- dbq
- ddn
- ded
- des
- deu
- dga
- dgi
- dgk
- dgo
- dgr
- dhi
- did
- dig
- dik
- dip
- div
- djk
- dnj-dialect_blowowest
- dnj-dialect_gweetaawueast
- dnt
- dnw
- dop
- dos
- dsh
- dso
- dtp
- dts
- dug
- dwr
- dyi
- dyo
- dyu
- dzo
- eip
- eka
- ell
- emp
- enb
- eng
- enx
- epo
- ese
- ess
- est
- eus
- evn
- ewe
- eza
- fal
- fao
- far
- fas
- fij
- fin
- flr
- fmu
- fon
- fra
- frd
- fry
- ful
- gag-script_cyrillic
- gag-script_latin
- gai
- gam
- gau
- gbi
- gbk
- gbm
- gbo
- gde
- geb
- gej
- gil
- gjn
- gkn
- gld
- gle
- glg
- glk
- gmv
- gna
- gnd
- gng
- gof-script_latin
- gog
- gor
- gqr
- grc
- gri
- grn
- grt
- gso
- gub
- guc
- gud
- guh
- guj
- guk
- gum
- guo
- guq
- guu
- gux
- gvc
- gvl
- gwi
- gwr
- gym
- gyr
- had
- hag
- hak
- hap
- hat
- hau
- hay
- heb
- heh
- hif
- hig
- hil
- hin
- hlb
- hlt
- hne
- hnn
- hns
- hoc
- hoy
- hrv
- hsb
- hto
- hub
- hui
- hun
- hus-dialect_centralveracruz
- hus-dialect_westernpotosino
- huu
- huv
- hvn
- hwc
- hye
- hyw
- iba
- ibo
- icr
- idd
- ifa
- ifb
- ife
- ifk
- ifu
- ify
- ign
- ikk
- ilb
- ilo
- imo
- ina
- inb
- ind
- iou
- ipi
- iqw
- iri
- irk
- isl
- ita
- itl
- itv
- ixl-dialect_sangasparchajul
- ixl-dialect_sanjuancotzal
- ixl-dialect_santamarianebaj
- izr
- izz
- jac
- jam
- jav
- jbu
- jen
- jic
- jiv
- jmc
- jmd
- jpn
- jun
- juy
- jvn
- kaa
- kab
- kac
- kak
- kam
- kan
- kao
- kaq
- kat
- kay
- kaz
- kbo
- kbp
- kbq
- kbr
- kby
- kca
- kcg
- kdc
- kde
- kdh
- kdi
- kdj
- kdl
- kdn
- kdt
- kea
- kek
- ken
- keo
- ker
- key
- kez
- kfb
- kff-script_telugu
- kfw
- kfx
- khg
- khm
- khq
- kia
- kij
- kik
- kin
- kir
- kjb
- kje
- kjg
- kjh
- kki
- kkj
- kle
- klu
- klv
- klw
- kma
- kmd
- kml
- kmr-script_arabic
- kmr-script_cyrillic
- kmr-script_latin
- kmu
- knb
- kne
- knf
- knj
- knk
- kno
- kog
- kor
- kpq
- kps
- kpv
- kpy
- kpz
- kqe
- kqp
- kqr
- kqy
- krc
- kri
- krj
- krl
- krr
- krs
- kru
- ksb
- ksr
- kss
- ktb
- ktj
- kub
- kue
- kum
- kus
- kvn
- kvw
- kwd
- kwf
- kwi
- kxc
- kxf
- kxm
- kxv
- kyb
- kyc
- kyf
- kyg
- kyo
- kyq
- kyu
- kyz
- kzf
- lac
- laj
- lam
- lao
- las
- lat
- lav
- law
- lbj
- lbw
- lcp
- lee
- lef
- lem
- lew
- lex
- lgg
- lgl
- lhu
- lia
- lid
- lif
- lin
- lip
- lis
- lit
- lje
- ljp
- llg
- lln
- lme
- lnd
- lns
- lob
- lok
- lom
- lon
- loq
- lsi
- lsm
- ltz
- luc
- lug
- luo
- lwo
- lww
- lzz
- maa-dialect_sanantonio
- maa-dialect_sanjeronimo
- mad
- mag
- mah
- mai
- maj
- mak
- mal
- mam-dialect_central
- mam-dialect_northern
- mam-dialect_southern
- mam-dialect_western
- maq
- mar
- maw
- maz
- mbb
- mbc
- mbh
- mbj
- mbt
- mbu
- mbz
- mca
- mcb
- mcd
- mco
- mcp
- mcq
- mcu
- mda
- mdf
- mdv
- mdy
- med
- mee
- mej
- men
- meq
- met
- mev
- mfe
- mfh
- mfi
- mfk
- mfq
- mfy
- mfz
- mgd
- mge
- mgh
- mgo
- mhi
- mhr
- mhu
- mhx
- mhy
- mib
- mie
- mif
- mih
- mil
- mim
- min
- mio
- mip
- miq
- mit
- miy
- miz
- mjl
- mjv
- mkd
- mkl
- mkn
- mlg
- mlt
- mmg
- mnb
- mnf
- mnk
- mnw
- mnx
- moa
- mog
- mon
- mop
- mor
- mos
- mox
- moz
- mpg
- mpm
- mpp
- mpx
- mqb
- mqf
- mqj
- mqn
- mri
- mrw
- msy
- mtd
- mtj
- mto
- muh
- mup
- mur
- muv
- muy
- mvp
- mwq
- mwv
- mxb
- mxq
- mxt
- mxv
- mya
- myb
- myk
- myl
- myv
- myx
- myy
- mza
- mzi
- mzj
- mzk
- mzm
- mzw
- nab
- nag
- nan
- nas
- naw
- nca
- nch
- ncj
- ncl
- ncu
- ndj
- ndp
- ndv
- ndy
- ndz
- neb
- new
- nfa
- nfr
- nga
- ngl
- ngp
- ngu
- nhe
- nhi
- nhu
- nhw
- nhx
- nhy
- nia
- nij
- nim
- nin
- nko
- nlc
- nld
- nlg
- nlk
- nmz
- nnb
- nno
- nnq
- nnw
- noa
- nob
- nod
- nog
- not
- npi
- npl
- npy
- nso
- nst
- nsu
- ntm
- ntr
- nuj
- nus
- nuz
- nwb
- nxq
- nya
- nyf
- nyn
- nyo
- nyy
- nzi
- obo
- oci
- ojb-script_latin
- ojb-script_syllabics
- oku
- old
- omw
- onb
- ood
- orm
- ory
- oss
- ote
- otq
- ozm
- pab
- pad
- pag
- pam
- pan
- pao
- pap
- pau
- pbb
- pbc
- pbi
- pce
- pcm
- peg
- pez
- pib
- pil
- pir
- pis
- pjt
- pkb
- pls
- plw
- pmf
- pny
- poh-dialect_eastern
- poh-dialect_western
- poi
- pol
- por
- poy
- ppk
- pps
- prf
- prk
- prt
- pse
- pss
- ptu
- pui
- pus
- pwg
- pww
- pxm
- qub
- quc-dialect_central
- quc-dialect_east
- quc-dialect_north
- quf
- quh
- qul
- quw
- quy
- quz
- qvc
- qve
- qvh
- qvm
- qvn
- qvo
- qvs
- qvw
- qvz
- qwh
- qxh
- qxl
- qxn
- qxo
- qxr
- rah
- rai
- rap
- rav
- raw
- rej
- rel
- rgu
- rhg
- rif-script_arabic
- rif-script_latin
- ril
- rim
- rjs
- rkt
- rmc-script_cyrillic
- rmc-script_latin
- rmo
- rmy-script_cyrillic
- rmy-script_latin
- rng
- rnl
- roh-dialect_sursilv
- roh-dialect_vallader
- rol
- ron
- rop
- rro
- rub
- ruf
- rug
- run
- rus
- sab
- sag
- sah
- saj
- saq
- sas
- sat
- sba
- sbd
- sbl
- sbp
- sch
- sck
- sda
- sea
- seh
- ses
- sey
- sgb
- sgj
- sgw
- shi
- shk
- shn
- sho
- shp
- sid
- sig
- sil
- sja
- sjm
- sld
- slk
- slu
- slv
- sml
- smo
- sna
- snd
- sne
- snn
- snp
- snw
- som
- soy
- spa
- spp
- spy
- sqi
- sri
- srm
- srn
- srp-script_cyrillic
- srp-script_latin
- srx
- stn
- stp
- suc
- suk
- sun
- sur
- sus
- suv
- suz
- swe
- swh
- sxb
- sxn
- sya
- syl
- sza
- tac
- taj
- tam
- tao
- tap
- taq
- tat
- tav
- tbc
- tbg
- tbk
- tbl
- tby
- tbz
- tca
- tcc
- tcs
- tcz
- tdj
- ted
- tee
- tel
- tem
- teo
- ter
- tes
- tew
- tex
- tfr
- tgj
- tgk
- tgl
- tgo
- tgp
- tha
- thk
- thl
- tih
- tik
- tir
- tkr
- tlb
- tlj
- tly
- tmc
- tmf
- tna
- tng
- tnk
- tnn
- tnp
- tnr
- tnt
- tob
- toc
- toh
- tom
- tos
- tpi
- tpm
- tpp
- tpt
- trc
- tri
- trn
- trs
- tso
- tsz
- ttc
- tte
- ttq-script_tifinagh
- tue
- tuf
- tuk-script_arabic
- tuk-script_latin
- tuo
- tur
- tvw
- twb
- twe
- twu
- txa
- txq
- txu
- tye
- tzh-dialect_bachajon
- tzh-dialect_tenejapa
- tzj-dialect_eastern
- tzj-dialect_western
- tzo-dialect_chamula
- tzo-dialect_chenalho
- ubl
- ubu
- udm
- udu
- uig-script_arabic
- uig-script_cyrillic
- ukr
- umb
- unr
- upv
- ura
- urb
- urd-script_arabic
- urd-script_devanagari
- urd-script_latin
- urk
- urt
- ury
- usp
- uzb-script_cyrillic
- uzb-script_latin
- vag
- vid
- vie
- vif
- vmw
- vmy
- vot
- vun
- vut
- wal-script_ethiopic
- wal-script_latin
- wap
- war
- waw
- way
- wba
- wlo
- wlx
- wmw
- wob
- wol
- wsg
- wwa
- xal
- xdy
- xed
- xer
- xho
- xmm
- xnj
- xnr
- xog
- xon
- xrb
- xsb
- xsm
- xsr
- xsu
- xta
- xtd
- xte
- xtm
- xtn
- xua
- xuo
- yaa
- yad
- yal
- yam
- yao
- yas
- yat
- yaz
- yba
- ybb
- ycl
- ycn
- yea
- yka
- yli
- yor
- yre
- yua
- yue-script_traditional
- yuz
- yva
- zaa
- zab
- zac
- zad
- zae
- zai
- zam
- zao
- zaq
- zar
- zas
- zav
- zaw
- zca
- zga
- zim
- ziw
- zlm
- zmz
- zne
- zos
- zpc
- zpg
- zpi
- zpl
- zpm
- zpo
- zpt
- zpu
- zpz
- ztq
- zty
- zul
- zyb
- zyp
- zza
</details>
## Model details
- **Developed by:** Vineel Pratap et al.
- **Model type:** Multi-Lingual Automatic Speech Recognition model
- **Language(s):** 1000+ languages, see [supported languages](#supported-languages)
- **License:** CC-BY-NC 4.0 license
- **Num parameters**: 1 billion
- **Audio sampling rate**: 16,000 kHz
- **Cite as:**
@article{pratap2023mms,
title={Scaling Speech Technology to 1,000+ Languages},
author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
journal={arXiv},
year={2023}
}
## Additional Links
- [Blog post](https://ai.facebook.com/blog/multilingual-model-speech-recognition/)
- [Transformers documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
- [Paper](https://arxiv.org/abs/2305.13516)
- [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms#asr)
- [Other **MMS** checkpoints](https://huggingface.co/models?other=mms)
- MMS base checkpoints:
- [facebook/mms-1b](https://huggingface.co/facebook/mms-1b)
- [facebook/mms-300m](https://huggingface.co/facebook/mms-300m)
- [Official Space](https://huggingface.co/spaces/facebook/MMS)