Automatic Speech Recognition
Transformers
Safetensors
Welsh
English
wav2vec2
speech
Inference Endpoints
DewiBrynJones commited on
Commit
ff549fc
1 Parent(s): 8f1b3d0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -40
README.md CHANGED
@@ -1,42 +1,30 @@
1
  ---
2
- language:
3
- - cy
4
- - en
5
  datasets:
6
- - common_voice
 
7
  metrics:
8
  - wer
9
  tags:
10
  - automatic-speech-recognition
11
  - speech
12
  license: apache-2.0
13
- model-index:
14
- - name: wav2vec2-xlsr-ft-en-cy
15
- results:
16
- - task:
17
- name: Speech Recognition
18
- type: automatic-speech-recognition
19
- dataset:
20
- name: Common Voice cy
21
- type: common_voice
22
- args: cy
23
- metrics:
24
- - name: Test WER
25
- type: wer
26
- value: 17.70%
27
  ---
28
 
29
- # wav2vec2-xlsr-ft-en-cy
30
-
31
- A speech recognition acoustic model for Welsh and English, fine-tuned from [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) using English/Welsh balanced data derived from version 11 of their respective Common Voice datasets (https://commonvoice.mozilla.org/cy/datasets). Custom bilingual Common Voice train/dev and test splits were built using the scripts at https://github.com/techiaith/docker-commonvoice-custom-splits-builder#introduction
32
-
33
- Source code and scripts for training wav2vec2-xlsr-ft-en-cy can be found at [https://github.com/techiaith/docker-wav2vec2-cy](https://github.com/techiaith/docker-wav2vec2-cy/blob/main/train/fine-tune/python/run_en_cy.sh).
34
-
35
 
 
 
 
 
 
36
 
37
  ## Usage
38
 
39
- The wav2vec2-xlsr-ft-en-cy model can be used directly as follows:
40
 
41
  ```python
42
  import torch
@@ -45,8 +33,8 @@ import librosa
45
 
46
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
47
 
48
- processor = Wav2Vec2Processor.from_pretrained("techiaith/wav2vec2-xlsr-ft-en-cy")
49
- model = Wav2Vec2ForCTC.from_pretrained("techiaith/wav2vec2-xlsr-ft-en-cy")
50
 
51
  audio, rate = librosa.load(audio_file, sr=16000)
52
 
@@ -61,16 +49,3 @@ predicted_ids = torch.argmax(logits, dim=-1)
61
  print("Prediction:", processor.batch_decode(predicted_ids))
62
 
63
  ```
64
-
65
- ## Evaluation
66
-
67
-
68
- According to a balanced English+Welsh test set derived from Common Voice version 11, the WER of techiaith/wav2vec2-xlsr-ft-en-cy is **17.7%**
69
-
70
- However, when evaluated with language specific test sets, the model exhibits a bias to perform better with Welsh.
71
-
72
- | Common Voice Test Set Language | WER | CER |
73
- | -------- | --- | --- |
74
- | EN+CY | 17.07| 7.32 |
75
- | EN | 27.54 | 11.6 |
76
- | CY | 7.13 | 2.2 |
 
1
  ---
2
+ language:
3
+ - cy
4
+ - en
5
  datasets:
6
+ - techiaith/banc-trawsgrifiadau-bangor
7
+ - techiaith/commonvoice_16_1_en_cy
8
  metrics:
9
  - wer
10
  tags:
11
  - automatic-speech-recognition
12
  - speech
13
  license: apache-2.0
14
+ pipeline_tag: automatic-speech-recognition
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
+ # wav2vec2-xlsr-ft-cy-en
 
 
 
 
 
18
 
19
+ An acoustic encoder model for Welsh and English speech recognition, fine-tuned from
20
+ [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) using transcribed
21
+ spontaneous speech from
22
+ [techiaith/banc-trawsgrifiadau-bangor (v24.01)](https://huggingface.co/datasets/techiaith/banc-trawsgrifiadau-bangor/tree/24.01)
23
+ as well as Welsh and English speech data derived from version 16.1 the Common Voice datasets [techiaith/commonvoice_16_1_en_cy](https://huggingface.co/datasets/techiaith/commonvoice_16_1_en_cy)
24
 
25
  ## Usage
26
 
27
+ The wav2vec2-xlsr-ft-cy-en model can be used directly as follows:
28
 
29
  ```python
30
  import torch
 
33
 
34
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
35
 
36
+ processor = Wav2Vec2Processor.from_pretrained("techiaith/wav2vec2-xlsr-ft-cy-en")
37
+ model = Wav2Vec2ForCTC.from_pretrained("techiaith/wav2vec2-xlsr-ft-cy-en")
38
 
39
  audio, rate = librosa.load(audio_file, sr=16000)
40
 
 
49
  print("Prediction:", processor.batch_decode(predicted_ids))
50
 
51
  ```