metadata
license: apache-2.0
datasets:
- mozilla-foundation/common_voice_10_0
base_model:
- facebook/wav2vec2-xls-r-300m
tags:
- pytorch
- phoneme-recognition
pipeline_tag: automatic-speech-recognition
Model Information
Allophant is a multilingual phoneme recognizer trained on spoken sentences in 34 languages, capable of generalizing zero-shot to unseen phoneme inventories.
The model is based on facebook/wav2vec2-xls-r-300m and was pre-trained on a subset of the Common Voice Corpus 10.0 transcribed with eSpeak NG.
Model Name | UCLA Phonetic Corpus (PER) | UCLA Phonetic Corpus (AER) | Common Voice (PER) | Common Voice (AER) |
---|---|---|---|---|
Multitask | 45.62% | 19.44% | 34.34% | 8.36% |
Hierarchical | 46.09% | 19.18% | 34.35% | 8.56% |
Multitask Shared | 46.05% | 19.52% | 41.20% | 8.88% |
Baseline Shared | 48.25% | - | 45.35% | - |
Baseline | 57.01% | - | 46.95% | - |
Note that our baseline models were trained without phonetic feature classifiers and therefore only support phoneme recognition.
Citation
@inproceedings{glocker2023allophant,
title={Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes},
author={Glocker, Kevin and Herygers, Aaricia and Georges, Munir},
year={2023},
booktitle={{Proc. Interspeech 2023}},
month={8}}