File size: 4,710 Bytes
4453fc2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1d170b6
4453fc2
3988fbf
4453fc2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
410d4e0
4453fc2
 
93d2e65
4453fc2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8086540
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
language: ga
datasets:
- common_voice
- living-audio-Irish
metrics:
- wer
tags:
- audio
- automatic-speech-recognition
- ga-IE 
- speech
- Irish
- Gaelic
model-index:
- name: Wav2vec 2.0 large 300m XLS-R
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 10.0
      type: common_voice
      args: ga-IE
    metrics:
    - name: Test WER
      type: wer
      value: 25.94
---

# Irish-Gaelic Automatic Speech Recognition

This is the model for Irish ASR. It has been trained on the Common-voice dataset and living Irish audio dataset. The Common-voice code for the Irish language is ga-IE. From the Common voice dataset, all the Validated audio clips and all the living audio clips were taken into account and after a random train-test split, 90% of the total dataset (5156 utterances) were taken for training, and the rest of the 10% of real data (579 utterances) were taken for testing. 

This dataset was finetuned on wav2vec2-large-xls-r-300m. On the testing dataset, 25.94% of WER could be achieved. 

### How to use
Example of transcribing the Common Voice audio clip from the invalidated dataset, using GPU if available. The model expects 16kHz audio.

```python
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

model = Wav2Vec2ForCTC.from_pretrained("Aditya3107/wav2vec2-large-xls-r-1b-ga-ie")
processor = Wav2Vec2Processor.from_pretrained("Aditya3107/wav2vec2-large-xls-r-1b-ga-ie")

# Reading taken audio clip
import librosa, torch
audio, rate = librosa.load("common-voice-irish/common_voice/cv-corpus-10.0-2022-07-04/ga-IE/clips/common_voice_ga-IE_1818627.mp3", sr = 16000)

# Taking an input value
input_values = processor(audio, sampling_rate=16_000, return_tensors = "pt", padding="longest").input_values
# Storing logits (non-normalized prediction values)
logits = model(input_values).logits
# Storing predicted ids
prediction = torch.argmax(logits, dim = -1)

# Passing the prediction to the tokenizer decode to get the transcription
transcription = processor.batch_decode(prediction)[0]
print(transcription)
```
### Results
Example of the transcribed audio clips and testing on SCLITE. 
```
Speaker sentences   0:     #utts: 1
           
id: (common_voice_ga-IE_17401296.mp3)
Scores: (#C #S #D #I) 4 1 0 0
Attributes: Case_sensitve 
REF:  an bhfuil cóta bán óir 
HYP:  an bhfuil cóta bán air  
Eval:                      S    

id: (common_voice_ga-IE_17410244.mp3)
Scores: (#C #S #D #I) 3 1 0 2
Attributes: Case_sensitve 
REF:  *** ** an bud é sin 
HYP:  cad é an rud é sin 
Eval: I   I     S          

id: (common_voice_ga-IE_17410257.mp3)
Scores: (#C #S #D #I) 9 2 1 2
Attributes: Case_sensitve 
REF:  i gabhaim buíochas libh a chairde ******* ** támindéagtstruth le tuilleadh uaibh ar baá 
HYP:  * gabhaim buíochas libh a chairde táimid ag tsnúth            le tuilleadh uaibh ar ball 
Eval: D                                  I       I  S                                        S    

id: (common_voice_ga-IE_17410401.mp3)
Scores: (#C #S #D #I) 6 1 0 0
Attributes: Case_sensitve 
REF:  níl ach tá peann ina phóca uige 
HYP:  níl ach tá peann ina phóca aige 
Eval:                               S    

id: (common_voice_ga-IE_17410403.mp3)
Scores: (#C #S #D #I) 5 1 0 1
Attributes: Case_sensitve 
REF:  agus *** cadé an dath atá air 
HYP:  agus cad é    an dath atá air 
Eval:      I   S                      

id: (common_voice_ga-IE_17410412.mp3)
Scores: (#C #S #D #I) 6 2 0 0
Attributes: Case_sensitve 
REF:  is lá é seo chun ceiliúradh  a dhéan    
HYP:  is lá é seo chun céiliúradh a dhéanamh 
Eval:                    S              S         

id: (common_voice_ga-IE_17444712.mp3)
Scores: (#C #S #D #I) 4 6 0 0
Attributes: Case_sensitve 
REF:  don chathaoileach  mirín   de brom  don stiúrdhóirat liam ón maoladha  
HYP:  don chathaoirleach máirín de brún don stiúrthóir   liam ó  maolaodha 
Eval:     S              S           S         S                   S   S         

id: (common_voice_ga-IE_17449454.mp3)
Scores: (#C #S #D #I) 4 0 0 0
Attributes: Case_sensitve 
REF:  ceacht a trí déag 
HYP:  ceacht a trí déag 
Eval:                     
```
### Future Tasks
The language model with KenLM will be added if any good resource of Irish text is found. 

### Citation
If you want to cite this model you can use this:

```
@MISC {,
    author       = "Aditya Parikh",
    title        = "Finetuned XLS-R model for Irish (Ga-IE) language for Automatic Speech Recognition",
    howpublished = "{\url{https://huggingface.co/Aditya3107/wav2vec2-large-xls-r-1b-ga-ie}}",
    month        = "aug",
    year         = "2022"
}
```