File size: 1,559 Bytes
57da15c 895c881 57da15c 895c881 2f35202 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
license: apache-2.0
language:
- en
metrics:
- cer
- wer
library_name: transformers
pipeline_tag: automatic-speech-recognition
---
# Model
This model is [Wav2Vec2-Large-XLSR-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
fine-tuned on the manually annotated subset of
CMU's [L2-Arctic dataset](https://psi.engr.tamu.edu/l2-arctic-corpus/). It was fine-tuned
to perform automatic phonetic transcriptions in IPA.
It was tuned following a similar procedure as described
by [vitouphy](https://huggingface.co/vitouphy/wav2vec2-xls-r-300m-timit-phoneme)
with the TIMIT dataset.
# Usage
To use the model, create a pipeline and invoke it with
the path to your WAV, which must be sampled at 16KHz.
```python
from transformers import pipeline
pipe = pipeline(model="mrrubino/wav2vec2-large-xlsr-53-l2-arctic-phoneme")
transcription = pipe("file.wav")["text"]
```
# Results
The manually annotated subset of L2-Arctic was divided
into training and testing datasets with a 90/10 split.
The performance metrics for the testing dataset are
included below.
WER - 0.425
CER - 0.128
# Citation
If you find our model helpful, please feel free to cite us.
```
@article{Bo_Rubino_Xu_2024,
title={A Mispronunciation-Based Voice-Omics Representation Framework for Screening Specific Language Impairments in Children},
DOI={10.1109/ichi61247.2024.00045},
journal={2024 IEEE 12th International Conference on Healthcare Informatics (ICHI)},
author={Bo, Wei and Rubino, Matthew and Xu, Wenyao},
year={2024},
month={Jun},
pages={294–304}
}
``` |