kgnlp's picture
Updated model information
9ed7838
|
raw
history blame
1.87 kB
metadata
license: apache-2.0
datasets:
  - mozilla-foundation/common_voice_10_0
base_model:
  - facebook/wav2vec2-xls-r-300m
tags:
  - pytorch
  - phoneme-recognition
pipeline_tag: automatic-speech-recognition

Model Information

Allophant is a multilingual phoneme recognizer trained on spoken sentences in 34 languages, capable of generalizing zero-shot to unseen phoneme inventories.

The model is based on facebook/wav2vec2-xls-r-300m and was pre-trained on a subset of the Common Voice Corpus 10.0 transcribed with eSpeak NG.

Model Name UCLA Phonetic Corpus (PER) UCLA Phonetic Corpus (AER) Common Voice (PER) Common Voice (AER)
Multitask 45.62% 19.44% 34.34% 8.36%
Hierarchical 46.09% 19.18% 34.35% 8.56%
Multitask Shared 46.05% 19.52% 41.20% 8.88%
Baseline Shared 48.25% - 45.35% -
Baseline 57.01% - 46.95% -

Note that our baseline models were trained without phonetic feature classifiers and therefore only support phoneme recognition.

Citation

@inproceedings{glocker2023allophant,
    title={Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes},
    author={Glocker, Kevin and Herygers, Aaricia and Georges, Munir},
    year={2023},
    booktitle={{Proc. Interspeech 2023}},
    month={8}}