Transformers
PyTorch
wav2vec2
pretraining
mms
Inference Endpoints
mms-300m / README.md
patrickvonplaten's picture
correct vocab
31a758e
|
raw
history blame
3.17 kB
metadata
tags:
  - mms
language:
  - ab
  - af
  - ak
  - am
  - ar
  - as
  - av
  - ay
  - az
  - ba
  - bm
  - be
  - bn
  - bi
  - bo
  - sh
  - br
  - bg
  - ca
  - cs
  - ce
  - cv
  - ku
  - cy
  - da
  - de
  - dv
  - dz
  - el
  - en
  - eo
  - et
  - eu
  - ee
  - fo
  - fa
  - fj
  - fi
  - fr
  - fy
  - ff
  - ga
  - gl
  - gn
  - gu
  - zh
  - ht
  - ha
  - he
  - hi
  - sh
  - hu
  - hy
  - ig
  - ia
  - ms
  - is
  - it
  - jv
  - ja
  - kn
  - ka
  - kk
  - kr
  - km
  - ki
  - rw
  - ky
  - ko
  - kv
  - lo
  - la
  - lv
  - ln
  - lt
  - lb
  - lg
  - mh
  - ml
  - mr
  - ms
  - mk
  - mg
  - mt
  - mn
  - mi
  - my
  - zh
  - nl
  - 'no'
  - 'no'
  - ne
  - ny
  - oc
  - om
  - or
  - os
  - pa
  - pl
  - pt
  - ms
  - ps
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - qu
  - ro
  - rn
  - ru
  - sg
  - sk
  - sl
  - sm
  - sn
  - sd
  - so
  - es
  - sq
  - su
  - sv
  - sw
  - ta
  - tt
  - te
  - tg
  - tl
  - th
  - ti
  - ts
  - tr
  - uk
  - ms
  - vi
  - wo
  - xh
  - ms
  - yo
  - ms
  - zu
  - za
license: cc-by-sa-4.0
datasets:
  - google/fleurs
metrics:
  - wer

Massively Multilingual Speech (MMS) - 300m

Facebook's MMS counting 300m parameters.

MMS is Facebook AI's massive multilingual pretrained model for speech ("MMS"). It is pretrained in with Wav2Vec2's self-supervised training objective on about 500,000 hours of speech data in over 1,400 languages.

When using the model make sure that your speech input is sampled at 16kHz.

Note: This model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Translation, or Classification. Check out the **How-to-fine section or this blog for more information about ASR.

Table Of Content

How to finetune

Coming soon...

Model details

  • Developed by: Vineel Pratap et al.

  • Model type: Multi-Lingual Automatic Speech Recognition model

  • Language(s): 1000+ languages

  • License: CC-BY-NC 4.0 license

  • Num parameters: 300 million

  • Cite as:

    @article{pratap2023mms,
      title={Scaling Speech Technology to 1,000+ Languages},
      author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
    journal={arXiv},
    year={2023}
    }
    

Additional Links