File size: 1,086 Bytes
57da15c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
license: apache-2.0
language:
- en
metrics:
- cer
- wer
library_name: transformers
pipeline_tag: automatic-speech-recognition
---

# Model
This model is [Wav2Vec2-Large-XLSR-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
fine-tuned on the manually annotated subset of
CMU's [L2-Arctic dataset](https://psi.engr.tamu.edu/l2-arctic-corpus/). It was fine-tuned
to perform automatic phonetic transcriptions in IPA.
It was tuned following a similar procedure as described
by [vitouphy](https://huggingface.co/vitouphy/wav2vec2-xls-r-300m-timit-phoneme)
with the TIMIT dataset.

# Usage
To use the model, create a pipeline and invoke it with
the path to your WAV, which must be sampled at 16KHz.

```python
from transformers import pipeline

pipe = pipeline(model="mrrubino/wav2vec2-large-xlsr-53-l2-arctic-phoneme")
transcription = pipe("file.wav")["text"]
```

# Results
The manually annotated subset of L2-Arctic was divided
into training and testing datasets with a 90/10 split.
The performance metrics for the testing dataset are
included below.

WER - 0.425
CER - 0.128