Aditya3107 commited on
Commit
4453fc2
1 Parent(s): 859bf24

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -0
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ga
3
+ datasets:
4
+ - common_voice
5
+ - living-audio-Irish
6
+ metrics:
7
+ - wer
8
+ tags:
9
+ - audio
10
+ - automatic-speech-recognition
11
+ - ga-IE
12
+ - speech
13
+ - Irish
14
+ - Gaelic
15
+ model-index:
16
+ - name: Wav2vec 2.0 large 300m XLS-R
17
+ results:
18
+ - task:
19
+ name: Automatic Speech Recognition
20
+ type: automatic-speech-recognition
21
+ dataset:
22
+ name: Common Voice 10.0
23
+ type: common_voice
24
+ args: ga-IE
25
+ metrics:
26
+ - name: Test WER
27
+ type: wer
28
+ value: 25.94
29
+ ---
30
+
31
+ # Irish-Gaelic Automatic Speech Recognition
32
+
33
+ This is the model for Irish ASR. It has been trained on the Common-voice dataset and living Irish audio dataset. The Common-voice code for the Irish language is ga-IE. From the Common voice dataset, all the Validated audio clips and all the living audio clips were taken into account and after a random train-test split, 90% percent of the total dataset (5156 utterances) were taken for training, and the rest of the 10% of real data (579 utterances) were taken for testing.
34
+
35
+ This dataset was finetuned on wav2vec2-large-xls-r-300m. On the testing dataset, 25.96% of WER could be achieved.
36
+
37
+ ### How to use
38
+ Example of transcribing the Common Voice audio clip from the invalidated dataset, using GPU if available. The model expects 16kHz audio.
39
+
40
+ ```python
41
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
42
+
43
+ model = Wav2Vec2ForCTC.from_pretrained("Aditya3107/wav2vec2-large-xls-r-1b-ga-ie")
44
+ processor = Wav2Vec2Processor.from_pretrained("Aditya3107/wav2vec2-large-xls-r-1b-ga-ie")
45
+
46
+ # Reading taken audio clip
47
+ import librosa, torch
48
+ audio, rate = librosa.load("common-voice-irish/common_voice/cv-corpus-10.0-2022-07-04/ga-IE/clips/common_voice_ga-IE_1818627.mp3", sr = 16000)
49
+
50
+ # Taking an input value
51
+ input_values = processor(audio, sampling_rate=16_000, return_tensors = "pt", padding="longest").input_values
52
+ # Storing logits (non-normalized prediction values)
53
+ logits = model(input_values).logits
54
+ # Storing predicted ids
55
+ prediction = torch.argmax(logits, dim = -1)
56
+
57
+ # Passing the prediction to the tokenizer decode to get the transcription
58
+ transcription = processor.batch_decode(prediction)[0]
59
+ print(transcription)
60
+ ```
61
+ ### Results
62
+ Example of the transcribed audio clips and testing on SCLITE. ]
63
+ ```
64
+ Speaker sentences 0: #utts: 1
65
+ id:
66
+ Scores: (#C #S #D #I) 0 1 0 0
67
+ Attributes: Case_sensitve
68
+ REF: reference_tag
69
+ HYP: hypothesis_tag
70
+ Eval: S
71
+
72
+ id: (common_voice_ga-IE_17401296.mp3)
73
+ Scores: (#C #S #D #I) 4 1 0 0
74
+ Attributes: Case_sensitve
75
+ REF: an bhfuil cóta bán óir
76
+ HYP: an bhfuil cóta bán air
77
+ Eval: S
78
+
79
+ id: (common_voice_ga-IE_17410244.mp3)
80
+ Scores: (#C #S #D #I) 3 1 0 2
81
+ Attributes: Case_sensitve
82
+ REF: *** ** an bud é sin
83
+ HYP: cad é an rud é sin
84
+ Eval: I I S
85
+
86
+ id: (common_voice_ga-IE_17410257.mp3)
87
+ Scores: (#C #S #D #I) 9 2 1 2
88
+ Attributes: Case_sensitve
89
+ REF: i gabhaim buíochas libh a chairde ******* ** támindéagtstruth le tuilleadh uaibh ar baá
90
+ HYP: * gabhaim buíochas libh a chairde táimid ag tsnúth le tuilleadh uaibh ar ball
91
+ Eval: D I I S S
92
+
93
+ id: (common_voice_ga-IE_17410401.mp3)
94
+ Scores: (#C #S #D #I) 6 1 0 0
95
+ Attributes: Case_sensitve
96
+ REF: níl ach tá peann ina phóca uige
97
+ HYP: níl ach tá peann ina phóca aige
98
+ Eval: S
99
+
100
+ id: (common_voice_ga-IE_17410403.mp3)
101
+ Scores: (#C #S #D #I) 5 1 0 1
102
+ Attributes: Case_sensitve
103
+ REF: agus *** cadé an dath atá air
104
+ HYP: agus cad é an dath atá air
105
+ Eval: I S
106
+
107
+ id: (common_voice_ga-IE_17410412.mp3)
108
+ Scores: (#C #S #D #I) 6 2 0 0
109
+ Attributes: Case_sensitve
110
+ REF: is lá é seo chun ceiliúradh a dhéan
111
+ HYP: is lá é seo chun céiliúradh a dhéanamh
112
+ Eval: S S
113
+
114
+ id: (common_voice_ga-IE_17444712.mp3)
115
+ Scores: (#C #S #D #I) 4 6 0 0
116
+ Attributes: Case_sensitve
117
+ REF: don chathaoileach mirín de brom don stiúrdhóirat liam ón maoladha
118
+ HYP: don chathaoirleach máirín de brún don stiúrthóir liam ó maolaodha
119
+ Eval: S S S S S S
120
+
121
+ id: (common_voice_ga-IE_17449454.mp3)
122
+ Scores: (#C #S #D #I) 4 0 0 0
123
+ Attributes: Case_sensitve
124
+ REF: ceacht a trí déag
125
+ HYP: ceacht a trí déag
126
+ Eval:
127
+ ```
128
+ ### Future Tasks
129
+ The language model with KenLM will be added if any good resource of Irish text is found.