--- tags: - mms language: - ab - af - ak - am - ar - as - av - ay - az - ba - bm - be - bn - bi - bo - sh - br - bg - ca - cs - ce - cv - ku - cy - da - de - dv - dz - el - en - eo - et - eu - ee - fo - fa - fj - fi - fr - fy - ff - ga - gl - gn - gu - zh - ht - ha - he - hi - sh - hu - hy - ig - ia - ms - is - it - jv - ja - kn - ka - kk - kr - km - ki - rw - ky - ko - kv - lo - la - lv - ln - lt - lb - lg - mh - ml - mr - ms - mk - mg - mt - mn - mi - my - zh - nl - 'no' - 'no' - ne - ny - oc - om - or - os - pa - pl - pt - ms - ps - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - ro - rn - ru - sg - sk - sl - sm - sn - sd - so - es - sq - su - sv - sw - ta - tt - te - tg - tl - th - ti - ts - tr - uk - ms - vi - wo - xh - ms - yo - ms - zu - za license: cc-by-sa-4.0 datasets: - google/fleurs metrics: - wer --- # Massively Multilingual Speech (MMS) - Finetuned ASR - ALL This checkpoint is a model fine-tuned for multi-lingual ASR and part of Facebook's [Massive Multilingual Speech project](https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/). This checkpoint is based on the [Wav2Vec2 architecture](https://huggingface.co/docs/transformers/model_doc/wav2vec2) and makes use of adapter models to transcribe 1000+ languages. The checkpoint consists of **1 billion parameters** and has been fine-tuned from [facebook/mms-1b](https://huggingface.co/facebook/mms-1b) on 1162 languages. ## Table Of Content - [Example](#example) - [Supported Languages](#supported-languages) - [Model details](#model-details) - [Additional links](#additional-links) ## Example This MMS checkpoint can be used with [Transformers](https://github.com/huggingface/transformers) to transcribe audio of 1107 different languages. Let's look at a simple example. First, we install transformers and some other libraries ``` pip install torch accelerate torchaudio datasets pip install --upgrade transformers ```` **Note**: In order to use MMS you need to have at least `transformers >= 4.30` installed. If the `4.30` version is not yet available [on PyPI](https://pypi.org/project/transformers/) make sure to install `transformers` from source: ``` pip install git+https://github.com/huggingface/transformers.git ``` Next, we load a couple of audio samples via `datasets`. Make sure that the audio data is sampled to 16000 kHz. ```py from datasets import load_dataset, Audio # English stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True) stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000)) en_sample = next(iter(stream_data))["audio"]["array"] # French stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "fr", split="test", streaming=True) stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000)) fr_sample = next(iter(stream_data))["audio"]["array"] ``` Next, we load the model and processor ```py from transformers import Wav2Vec2ForCTC, AutoProcessor import torch model_id = "facebook/mms-1b-all" processor = AutoProcessor.from_pretrained(model_id) model = Wav2Vec2ForCTC.from_pretrained(model_id) ``` Now we process the audio data, pass the processed audio data to the model and transcribe the model output, just like we usually do for Wav2Vec2 models such as [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h) ```py inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs).logits ids = torch.argmax(outputs, dim=-1)[0] transcription = processor.decode(ids) # 'joe keton disapproved of films and buster also had reservations about the media' ``` We can now keep the same model in memory and simply switch out the language adapters by calling the convenient [`load_adapter()`]() function for the model and [`set_target_lang()`]() for the tokenizer. We pass the target language as an input - "fra" for French. ```py processor.tokenizer.set_target_lang("fra") model.load_adapter("fra") inputs = processor(fr_sample, sampling_rate=16_000, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs).logits ids = torch.argmax(outputs, dim=-1)[0] transcription = processor.decode(ids) # "ce dernier est volé tout au long de l'histoire romaine" ``` In the same way the language can be switched out for all other supported languages. Please have a look at: ```py processor.tokenizer.vocab.keys() ``` For more details, please have a look at [the official docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms). ## Supported Languages This model supports 1162 languages. Unclick the following to toogle all supported languages of this checkpoint in [ISO 639-3 code](https://en.wikipedia.org/wiki/ISO_639-3). You can find more details about the languages and their ISO 649-3 codes in the [MMS Language Coverage Overview](https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html).
Click to toggle - abi - abk - abp - aca - acd - ace - acf - ach - acn - acr - acu - ade - adh - adj - adx - aeu - afr - agd - agg - agn - agr - agu - agx - aha - ahk - aia - aka - akb - ake - akp - alj - alp - alt - alz - ame - amf - amh - ami - amk - ann - any - aoz - apb - apr - ara - arl - asa - asg - asm - ast - ata - atb - atg - ati - atq - ava - avn - avu - awa - awb - ayo - ayr - ayz - azb - azg - azj-script_cyrillic - azj-script_latin - azz - bak - bam - ban - bao - bas - bav - bba - bbb - bbc - bbo - bcc-script_arabic - bcc-script_latin - bcl - bcw - bdg - bdh - bdq - bdu - bdv - beh - bel - bem - ben - bep - bex - bfa - bfo - bfy - bfz - bgc - bgq - bgr - bgt - bgw - bha - bht - bhz - bib - bim - bis - biv - bjr - bjv - bjw - bjz - bkd - bkv - blh - blt - blx - blz - bmq - bmr - bmu - bmv - bng - bno - bnp - boa - bod - boj - bom - bor - bos - bov - box - bpr - bps - bqc - bqi - bqj - bqp - bre - bru - bsc - bsq - bss - btd - bts - btt - btx - bud - bul - bus - bvc - bvz - bwq - bwu - byr - bzh - bzi - bzj - caa - cab - cac-dialect_sanmateoixtatan - cac-dialect_sansebastiancoatan - cak-dialect_central - cak-dialect_santamariadejesus - cak-dialect_santodomingoxenacoj - cak-dialect_southcentral - cak-dialect_western - cak-dialect_yepocapa - cap - car - cas - cat - cax - cbc - cbi - cbr - cbs - cbt - cbu - cbv - cce - cco - cdj - ceb - ceg - cek - ces - cfm - cgc - che - chf - chv - chz - cjo - cjp - cjs - ckb - cko - ckt - cla - cle - cly - cme - cmn-script_simplified - cmo-script_khmer - cmo-script_latin - cmr - cnh - cni - cnl - cnt - coe - cof - cok - con - cot - cou - cpa - cpb - cpu - crh - crk-script_latin - crk-script_syllabics - crn - crq - crs - crt - csk - cso - ctd - ctg - cto - ctu - cuc - cui - cuk - cul - cwa - cwe - cwt - cya - cym - daa - dah - dan - dar - dbj - dbq - ddn - ded - des - deu - dga - dgi - dgk - dgo - dgr - dhi - did - dig - dik - dip - div - djk - dnj-dialect_blowowest - dnj-dialect_gweetaawueast - dnt - dnw - dop - dos - dsh - dso - dtp - dts - dug - dwr - dyi - dyo - dyu - dzo - eip - eka - ell - emp - enb - eng - enx - epo - ese - ess - est - eus - evn - ewe - eza - fal - fao - far - fas - fij - fin - flr - fmu - fon - fra - frd - fry - ful - gag-script_cyrillic - gag-script_latin - gai - gam - gau - gbi - gbk - gbm - gbo - gde - geb - gej - gil - gjn - gkn - gld - gle - glg - glk - gmv - gna - gnd - gng - gof-script_latin - gog - gor - gqr - grc - gri - grn - grt - gso - gub - guc - gud - guh - guj - guk - gum - guo - guq - guu - gux - gvc - gvl - gwi - gwr - gym - gyr - had - hag - hak - hap - hat - hau - hay - heb - heh - hif - hig - hil - hin - hlb - hlt - hne - hnn - hns - hoc - hoy - hrv - hsb - hto - hub - hui - hun - hus-dialect_centralveracruz - hus-dialect_westernpotosino - huu - huv - hvn - hwc - hye - hyw - iba - ibo - icr - idd - ifa - ifb - ife - ifk - ifu - ify - ign - ikk - ilb - ilo - imo - ina - inb - ind - iou - ipi - iqw - iri - irk - isl - ita - itl - itv - ixl-dialect_sangasparchajul - ixl-dialect_sanjuancotzal - ixl-dialect_santamarianebaj - izr - izz - jac - jam - jav - jbu - jen - jic - jiv - jmc - jmd - jpn - jun - juy - jvn - kaa - kab - kac - kak - kam - kan - kao - kaq - kat - kay - kaz - kbo - kbp - kbq - kbr - kby - kca - kcg - kdc - kde - kdh - kdi - kdj - kdl - kdn - kdt - kea - kek - ken - keo - ker - key - kez - kfb - kff-script_telugu - kfw - kfx - khg - khm - khq - kia - kij - kik - kin - kir - kjb - kje - kjg - kjh - kki - kkj - kle - klu - klv - klw - kma - kmd - kml - kmr-script_arabic - kmr-script_cyrillic - kmr-script_latin - kmu - knb - kne - knf - knj - knk - kno - kog - kor - kpq - kps - kpv - kpy - kpz - kqe - kqp - kqr - kqy - krc - kri - krj - krl - krr - krs - kru - ksb - ksr - kss - ktb - ktj - kub - kue - kum - kus - kvn - kvw - kwd - kwf - kwi - kxc - kxf - kxm - kxv - kyb - kyc - kyf - kyg - kyo - kyq - kyu - kyz - kzf - lac - laj - lam - lao - las - lat - lav - law - lbj - lbw - lcp - lee - lef - lem - lew - lex - lgg - lgl - lhu - lia - lid - lif - lin - lip - lis - lit - lje - ljp - llg - lln - lme - lnd - lns - lob - lok - lom - lon - loq - lsi - lsm - ltz - luc - lug - luo - lwo - lww - lzz - maa-dialect_sanantonio - maa-dialect_sanjeronimo - mad - mag - mah - mai - maj - mak - mal - mam-dialect_central - mam-dialect_northern - mam-dialect_southern - mam-dialect_western - maq - mar - maw - maz - mbb - mbc - mbh - mbj - mbt - mbu - mbz - mca - mcb - mcd - mco - mcp - mcq - mcu - mda - mdf - mdv - mdy - med - mee - mej - men - meq - met - mev - mfe - mfh - mfi - mfk - mfq - mfy - mfz - mgd - mge - mgh - mgo - mhi - mhr - mhu - mhx - mhy - mib - mie - mif - mih - mil - mim - min - mio - mip - miq - mit - miy - miz - mjl - mjv - mkd - mkl - mkn - mlg - mlt - mmg - mnb - mnf - mnk - mnw - mnx - moa - mog - mon - mop - mor - mos - mox - moz - mpg - mpm - mpp - mpx - mqb - mqf - mqj - mqn - mri - mrw - msy - mtd - mtj - mto - muh - mup - mur - muv - muy - mvp - mwq - mwv - mxb - mxq - mxt - mxv - mya - myb - myk - myl - myv - myx - myy - mza - mzi - mzj - mzk - mzm - mzw - nab - nag - nan - nas - naw - nca - nch - ncj - ncl - ncu - ndj - ndp - ndv - ndy - ndz - neb - new - nfa - nfr - nga - ngl - ngp - ngu - nhe - nhi - nhu - nhw - nhx - nhy - nia - nij - nim - nin - nko - nlc - nld - nlg - nlk - nmz - nnb - nno - nnq - nnw - noa - nob - nod - nog - not - npi - npl - npy - nso - nst - nsu - ntm - ntr - nuj - nus - nuz - nwb - nxq - nya - nyf - nyn - nyo - nyy - nzi - obo - oci - ojb-script_latin - ojb-script_syllabics - oku - old - omw - onb - ood - orm - ory - oss - ote - otq - ozm - pab - pad - pag - pam - pan - pao - pap - pau - pbb - pbc - pbi - pce - pcm - peg - pez - pib - pil - pir - pis - pjt - pkb - pls - plw - pmf - pny - poh-dialect_eastern - poh-dialect_western - poi - pol - por - poy - ppk - pps - prf - prk - prt - pse - pss - ptu - pui - pus - pwg - pww - pxm - qub - quc-dialect_central - quc-dialect_east - quc-dialect_north - quf - quh - qul - quw - quy - quz - qvc - qve - qvh - qvm - qvn - qvo - qvs - qvw - qvz - qwh - qxh - qxl - qxn - qxo - qxr - rah - rai - rap - rav - raw - rej - rel - rgu - rhg - rif-script_arabic - rif-script_latin - ril - rim - rjs - rkt - rmc-script_cyrillic - rmc-script_latin - rmo - rmy-script_cyrillic - rmy-script_latin - rng - rnl - roh-dialect_sursilv - roh-dialect_vallader - rol - ron - rop - rro - rub - ruf - rug - run - rus - sab - sag - sah - saj - saq - sas - sat - sba - sbd - sbl - sbp - sch - sck - sda - sea - seh - ses - sey - sgb - sgj - sgw - shi - shk - shn - sho - shp - sid - sig - sil - sja - sjm - sld - slk - slu - slv - sml - smo - sna - snd - sne - snn - snp - snw - som - soy - spa - spp - spy - sqi - sri - srm - srn - srp-script_cyrillic - srp-script_latin - srx - stn - stp - suc - suk - sun - sur - sus - suv - suz - swe - swh - sxb - sxn - sya - syl - sza - tac - taj - tam - tao - tap - taq - tat - tav - tbc - tbg - tbk - tbl - tby - tbz - tca - tcc - tcs - tcz - tdj - ted - tee - tel - tem - teo - ter - tes - tew - tex - tfr - tgj - tgk - tgl - tgo - tgp - tha - thk - thl - tih - tik - tir - tkr - tlb - tlj - tly - tmc - tmf - tna - tng - tnk - tnn - tnp - tnr - tnt - tob - toc - toh - tom - tos - tpi - tpm - tpp - tpt - trc - tri - trn - trs - tso - tsz - ttc - tte - ttq-script_tifinagh - tue - tuf - tuk-script_arabic - tuk-script_latin - tuo - tur - tvw - twb - twe - twu - txa - txq - txu - tye - tzh-dialect_bachajon - tzh-dialect_tenejapa - tzj-dialect_eastern - tzj-dialect_western - tzo-dialect_chamula - tzo-dialect_chenalho - ubl - ubu - udm - udu - uig-script_arabic - uig-script_cyrillic - ukr - umb - unr - upv - ura - urb - urd-script_arabic - urd-script_devanagari - urd-script_latin - urk - urt - ury - usp - uzb-script_cyrillic - uzb-script_latin - vag - vid - vie - vif - vmw - vmy - vot - vun - vut - wal-script_ethiopic - wal-script_latin - wap - war - waw - way - wba - wlo - wlx - wmw - wob - wol - wsg - wwa - xal - xdy - xed - xer - xho - xmm - xnj - xnr - xog - xon - xrb - xsb - xsm - xsr - xsu - xta - xtd - xte - xtm - xtn - xua - xuo - yaa - yad - yal - yam - yao - yas - yat - yaz - yba - ybb - ycl - ycn - yea - yka - yli - yor - yre - yua - yue-script_traditional - yuz - yva - zaa - zab - zac - zad - zae - zai - zam - zao - zaq - zar - zas - zav - zaw - zca - zga - zim - ziw - zlm - zmz - zne - zos - zpc - zpg - zpi - zpl - zpm - zpo - zpt - zpu - zpz - ztq - zty - zul - zyb - zyp - zza
## Model details - **Developed by:** Vineel Pratap et al. - **Model type:** Multi-Lingual Automatic Speech Recognition model - **Language(s):** 1000+ languages, see [supported languages](#supported-languages) - **License:** CC-BY-NC 4.0 license - **Num parameters**: 1 billion - **Audio sampling rate**: 16,000 kHz - **Cite as:** @article{pratap2023mms, title={Scaling Speech Technology to 1,000+ Languages}, author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli}, journal={arXiv}, year={2023} } ## Additional Links - [Blog post](https://ai.facebook.com/blog/multilingual-model-speech-recognition/) - [Transformers documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mms). - [Paper](https://arxiv.org/abs/2305.13516) - [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms#asr) - [Other **MMS** checkpoints](https://huggingface.co/models?other=mms) - MMS base checkpoints: - [facebook/mms-1b](https://huggingface.co/facebook/mms-1b) - [facebook/mms-300m](https://huggingface.co/facebook/mms-300m) - [Official Space](https://huggingface.co/spaces/facebook/MMS)