metadata

license: apache-2.0
datasets:
  - mozilla-foundation/common_voice_10_0
base_model:
  - facebook/wav2vec2-xls-r-300m
tags:
  - pytorch
  - phoneme-recognition
pipeline_tag: automatic-speech-recognition

Model Information

Allophant is a multilingual phoneme recognizer trained on spoken sentences in 34 languages, capable of generalizing zero-shot to unseen phoneme inventories.

The model is based on facebook/wav2vec2-xls-r-300m and was pre-trained on a subset of the Common Voice Corpus 10.0 transcribed with eSpeak NG.

Model Name	UCLA Phonetic Corpus (PER)	UCLA Phonetic Corpus (AER)	Common Voice (PER)	Common Voice (AER)
Multitask	45.62%	19.44%	34.34%	8.36%
Hierarchical	46.09%	19.18%	34.35%	8.56%
Multitask Shared	46.05%	19.52%	41.20%	8.88%
Baseline Shared	48.25%	-	45.35%	-
Baseline	57.01%	-	46.95%	-

Note that our baseline models were trained without phonetic feature classifiers and therefore only support phoneme recognition.

Citation

@inproceedings{glocker2023allophant,
    title={Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes},
    author={Glocker, Kevin and Herygers, Aaricia and Georges, Munir},
    year={2023},
    booktitle={{Proc. Interspeech 2023}},
    month={8}}