--- tags: - mms language: - ab - af - ak - am - ar - as - av - ay - az - ba - bm - be - bn - bi - bo - sh - br - bg - ca - cs - ce - cv - ku - cy - da - de - dv - dz - el - en - eo - et - eu - ee - fo - fa - fj - fi - fr - fy - ff - ga - gl - gn - gu - zh - ht - ha - he - hi - sh - hu - hy - ig - ia - ms - is - it - jv - ja - kn - ka - kk - kr - km - ki - rw - ky - ko - kv - lo - la - lv - ln - lt - lb - lg - mh - ml - mr - ms - mk - mg - mt - mn - mi - my - zh - nl - 'no' - 'no' - ne - ny - oc - om - or - os - pa - pl - pt - ms - ps - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - ro - rn - ru - sg - sk - sl - sm - sn - sd - so - es - sq - su - sv - sw - ta - tt - te - tg - tl - th - ti - ts - tr - uk - ms - vi - wo - xh - ms - yo - ms - zu - za license: cc-by-nc-4.0 datasets: - google/fleurs metrics: - acc --- # Massively Multilingual Speech (MMS) - Finetuned LID This checkpoint is a model fine-tuned for speech language identification (LID) and part of Facebook's [Massive Multilingual Speech project](https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/). This checkpoint is based on the [Wav2Vec2 architecture](https://huggingface.co/docs/transformers/model_doc/wav2vec2) and classifies raw audio input to a probability distribution over 1024 output classes (each class representing a language). The checkpoint consists of **1 billion parameters** and has been fine-tuned from [facebook/mms-1b](https://huggingface.co/facebook/mms-1b) on 1024 languages. ## Table Of Content - [Example](#example) - [Supported Languages](#supported-languages) - [Model details](#model-details) - [Additional links](#additional-links) ## Example This MMS checkpoint can be used with [Transformers](https://github.com/huggingface/transformers) to identify the spoken language of an audio. It can recognize the [following 1024 languages](#supported-languages). Let's look at a simple example. First, we install transformers and some other libraries ``` pip install torch accelerate torchaudio datasets pip install --upgrade transformers ```` **Note**: In order to use MMS you need to have at least `transformers >= 4.30` installed. If the `4.30` version is not yet available [on PyPI](https://pypi.org/project/transformers/) make sure to install `transformers` from source: ``` pip install git+https://github.com/huggingface/transformers.git ``` Next, we load a couple of audio samples via `datasets`. Make sure that the audio data is sampled to 16000 kHz. ```py from datasets import load_dataset, Audio # English stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True) stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000)) en_sample = next(iter(stream_data))["audio"]["array"] # Arabic stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "ar", split="test", streaming=True) stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000)) ar_sample = next(iter(stream_data))["audio"]["array"] ``` Next, we load the model and processor ```py from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor import torch model_id = "facebook/mms-lid-1024" processor = AutoFeatureExtractor.from_pretrained(model_id) model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id) ``` Now we process the audio data, pass the processed audio data to the model to classify it into a language, just like we usually do for Wav2Vec2 audio classification models such as [ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition](https://huggingface.co/harshit345/xlsr-wav2vec-speech-emotion-recognition) ```py # English inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs).logits lang_id = torch.argmax(outputs, dim=-1)[0].item() detected_lang = model.config.id2label[lang_id] # 'eng' # Arabic inputs = processor(ar_sample, sampling_rate=16_000, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs).logits lang_id = torch.argmax(outputs, dim=-1)[0].item() detected_lang = model.config.id2label[lang_id] # 'ara' ``` To see all the supported languages of a checkpoint, you can print out the language ids as follows: ```py processor.id2label.values() ``` For more details, about the architecture please have a look at [the official docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms). ## Supported Languages This model supports 1024 languages. Unclick the following to toogle all supported languages of this checkpoint in [ISO 639-3 code](https://en.wikipedia.org/wiki/ISO_639-3). You can find more details about the languages and their ISO 649-3 codes in the [MMS Language Coverage Overview](https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html).
Click to toggle - ara - cmn - eng - spa - fra - mlg - swe - por - vie - ful - sun - asm - ben - zlm - kor - ind - hin - tuk - urd - aze - slv - mon - hau - tel - swh - bod - rus - tur - heb - mar - som - tgl - tat - tha - cat - ron - mal - bel - pol - yor - nld - bul - hat - afr - isl - amh - tam - hun - hrv - lit - cym - fas - mkd - ell - bos - deu - sqi - jav - kmr - nob - uzb - snd - lat - nya - grn - mya - orm - lin - hye - yue - pan - jpn - kaz - npi - kik - kat - guj - kan - tgk - ukr - ces - lav - bak - khm - cak - fao - glg - ltz - xog - lao - mlt - sin - aka - sna - che - mam - ita - quc - aiw - srp - mri - tuv - nno - pus - eus - kbp - gur - ory - lug - crh - bre - luo - nhx - slk - ewe - xsm - fin - rif - dan - saq - yid - yao - mos - quh - hne - xon - new - dtp - quy - est - ddn - dyu - ttq - bam - pse - uig - sck - ngl - tso - mup - dga - seh - lis - wal - ctg - mip - bfz - bxk - ceb - kru - war - khg - bbc - thl - nzi - vmw - mzi - ycl - zne - sid - asa - tpi - bmq - box - zpu - gof - nym - cla - bgq - bfy - hlb - qxl - teo - fon - sda - kfx - bfa - mag - tzh - pil - maj - maa - kdt - ksb - lns - btd - rej - pap - ayr - any - mnk - adx - gud - krc - onb - xal - ctd - nxq - ava - blt - lbw - hyw - udm - zar - tzo - kpv - san - xnj - kek - chv - kcg - kri - ati - bgw - mxt - ybb - btx - dgi - nhy - dnj - zpz - yba - lon - smo - men - ium - mgd - taq - nga - nsu - zaj - tly - prk - zpt - akb - mhr - mxb - nuj - obo - kir - bom - run - zpg - hwc - mnw - ubl - kin - xtm - hnj - mpm - rkt - miy - luc - mih - kne - mib - flr - myv - xmm - knk - iba - gux - pis - zmz - ses - dav - lif - qxr - dig - kdj - wsg - tir - gbm - mai - zpc - kus - nyy - mim - nan - nyn - gog - ngu - tbz - hoc - nyf - sus - guk - gwr - yaz - bcc - sbd - spp - hak - grt - kno - oss - suk - spy - nij - lsm - kaa - bem - rmy - kqn - nim - ztq - nus - bib - xtd - ach - mil - keo - mpg - gjn - zaq - kdh - dug - sah - awa - kff - dip - rim - nhe - pcm - kde - tem - quz - mfq - las - bba - kbr - taj - dyo - zao - lom - shk - dik - dgo - zpo - fij - bgc - xnr - bud - kac - laj - mev - maw - quw - kao - dag - ktb - lhu - zab - mgh - shn - otq - lob - pbb - oci - zyb - bsq - mhi - dzo - zas - guc - alz - ctu - wol - guw - mnb - nia - zaw - mxv - bci - sba - kab - dwr - nnb - ilo - mfe - srx - ruf - srn - zad - xpe - pce - ahk - bcl - myk - haw - mad - ljp - bky - gmv - nag - nav - nyo - kxm - nod - sag - zpl - sas - myx - sgw - old - irk - acf - mak - kfy - zai - mie - zpm - zpi - ote - jam - kpz - lgg - lia - nhi - mzm - bdq - xtn - mey - mjl - sgj - kdi - kxc - miz - adh - tap - hay - kss - pam - gor - heh - nhw - ziw - gej - yua - itv - shi - qvw - mrw - hil - mbt - pag - vmy - lwo - cce - kum - klu - ann - mbb - npl - zca - pww - toc - ace - mio - izz - kam - zaa - krj - bts - eza - zty - hns - kki - min - led - alw - tll - rng - pko - toi - iqw - ncj - toh - umb - mog - hno - wob - gxx - hig - nyu - kby - ban - syl - bxg - nse - xho - zae - mkw - nch - ibg - mas - qvz - bum - bgd - mww - epo - tzm - zul - bcq - lrc - xdy - tyv - ibo - loz - mza - abk - azz - guz - arn - ksw - lus - tos - gvr - top - ckb - mer - pov - lun - rhg - knc - sfw - bev - tum - lag - nso - bho - ndc - maf - gkp - bax - awn - ijc - qug - lub - srr - mni - zza - ige - dje - mkn - bft - tiv - otn - kck - kqs - gle - lua - pdt - swk - mgw - ebu - ada - lic - skr - gaa - mfa - vmk - mcn - bto - lol - bwr - unr - dzg - hdy - kea - bhi - glk - mua - ast - nup - sat - ktu - bhb - zpq - coh - bkm - gya - sgc - dks - ncl - tui - emk - urh - ego - ogo - tsc - idu - igb - ijn - njz - ngb - tod - jra - mrt - zav - tke - its - ady - bzw - kng - kmb - lue - jmx - tsn - bin - ble - gom - ven - sef - sco - her - iso - trp - glv - haq - toq - okr - kha - wof - rmn - sot - kaj - bbj - sou - mjt - trd - gno - mwn - igl - rag - eyo - div - efi - nde - mfv - mix - rki - kjg - fan - khw - wci - bjn - pmy - bqi - ina - hni - mjx - kuj - aoz - the - tog - tet - nuz - ajg - ccp - mau - ymm - fmu - tcz - xmc - nyk - ztg - knx - snk - zac - esg - srb - thq - pht - wes - rah - pnb - ssy - zpv - kpo - phr - atd - eto - xta - mxx - mui - uki - tkt - mgp - xsq - enq - nnh - qxp - zam - bug - bxr - maq - tdt - khb - mrr - kas - zgb - kmw - lir - vah - dar - ssw - hmd - jab - iii - peg - shr - brx - rwr - bmb - kmc - mji - dib - pcc - nbe - mrd - ish - kai - yom - zyn - hea - ewo - bas - hms - twh - kfq - thr - xtl - wbr - bfb - wtm - mjc - blk - lot - dhd - swv - wbm - zzj - kge - mgm - niq - zpj - bwx - bde - mtr - gju - kjp - mbz - haz - lpo - yig - qud - shy - gjk - ztp - nbl - aii - kun - say - mde - sjp - bns - brh - ywq - msi - anr - mrg - mjg - tan - tsg - tcy - kbl - mdr - mks - noe - tyz - zpa - ahr - aar - wuu - khr - kbd - kex - bca - nku - pwr - hsn - ort - ott - swi - kua - tdd - msm - bgp - nbm - mxy - abs - zlj - ebo - lea - dub - sce - xkb - vav - bra - ssb - sss - nhp - kad - kvx - lch - tts - zyj - kxp - lmn - qvi - lez - scl - cqd - ayb - xbr - nqg - dcc - cjk - bfr - zyg - mse - gru - mdv - bew - wti - arg - dso - zdj - pll - mig - qxs - bol - drs - anp - chw - bej - vmc - otx - xty - bjj - vmz - ibb - gby - twx - tig - thz - tku - hmz - pbm - mfn - nut - cyo - mjw - cjm - tlp - naq - rnd - stj - sym - jax - btg - tdg - sng - nlv - kvr - pch - fvr - mxs - wni - mlq - kfr - mdj - osi - nhn - ukw - tji - qvj - nih - bcy - hbb - zpx - hoj - cpx - ogc - cdo - bgn - bfs - vmx - tvn - ior - mxa - btm - anc - jit - mfb - mls - ets - goa - bet - ikw - pem - trf - daq - max - rad - njo - bnx - mxl - mbi - nba - zpn - zts - mut - hnd - mta - hav - hac - ryu - abr - yer - cld - zag - ndo - sop - vmm - gcf - chr - cbk - sbk - bhp - odk - mbd - nap - gbr - mii - czh - xti - vls - gdx - sxw - zaf - wem - mqh - ank - yaf - vmp - otm - sdh - anw - src - mne - wss - meh - kzc - tma - ttj - ots - ilp - zpr - saz - ogb - akl - nhg - pbv - rcf - cgg - mku - bez - mwe - mtb - gul - ifm - mdh - scn - lki - xmf - sgd - aba - cos - luz - zpy - stv - kjt - mbf - kmz - nds - mtq - tkq - aee - knn - mbs - mnp - ema - bar - unx - plk - psi - mzn - cja - sro - mdw - ndh - vmj - zpw - kfu - bgx - gsw - fry - zpe - zpd - bta - psh - zat
## Model details - **Developed by:** Vineel Pratap et al. - **Model type:** Multi-Lingual Automatic Speech Recognition model - **Language(s):** 1024 languages, see [supported languages](#supported-languages) - **License:** CC-BY-NC 4.0 license - **Num parameters**: 1 billion - **Audio sampling rate**: 16,000 kHz - **Cite as:** @article{pratap2023mms, title={Scaling Speech Technology to 1,000+ Languages}, author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli}, journal={arXiv}, year={2023} } ## Additional Links - [Blog post](https://ai.facebook.com/blog/multilingual-model-speech-recognition/) - [Transformers documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mms). - [Paper](https://arxiv.org/abs/2305.13516) - [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms#asr) - [Other **MMS** checkpoints](https://huggingface.co/models?other=mms) - MMS base checkpoints: - [facebook/mms-1b](https://huggingface.co/facebook/mms-1b) - [facebook/mms-300m](https://huggingface.co/facebook/mms-300m) - [Official Space](https://huggingface.co/spaces/facebook/MMS)