techiaith
/

wav2vec2-xlsr-53-ft-cy-en

@@ -1,42 +1,30 @@
 ---
-language:
- - cy
- - en
 datasets:
-- common_voice
 metrics:
 - wer
 tags:
 - automatic-speech-recognition
 - speech
 license: apache-2.0
-model-index:
-- name: wav2vec2-xlsr-ft-en-cy
-  results:
-  - task:
-      name: Speech Recognition
-      type: automatic-speech-recognition
-    dataset:
-      name: Common Voice cy
-      type: common_voice
-      args: cy
-    metrics:
-    - name: Test WER
-      type: wer
-      value: 17.70%
 ---
-# wav2vec2-xlsr-ft-en-cy
-A speech recognition acoustic model for Welsh and English, fine-tuned from [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) using English/Welsh balanced data derived from version 11 of their respective Common Voice datasets (https://commonvoice.mozilla.org/cy/datasets). Custom bilingual Common Voice train/dev and test splits were built using the scripts at https://github.com/techiaith/docker-commonvoice-custom-splits-builder#introduction
-Source code and scripts for training wav2vec2-xlsr-ft-en-cy can be found at [https://github.com/techiaith/docker-wav2vec2-cy](https://github.com/techiaith/docker-wav2vec2-cy/blob/main/train/fine-tune/python/run_en_cy.sh).
 ## Usage
-The wav2vec2-xlsr-ft-en-cy model can be used directly as follows:
 ```python
 import torch
@@ -45,8 +33,8 @@ import librosa
 from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
-processor = Wav2Vec2Processor.from_pretrained("techiaith/wav2vec2-xlsr-ft-en-cy")
-model = Wav2Vec2ForCTC.from_pretrained("techiaith/wav2vec2-xlsr-ft-en-cy")
 audio, rate = librosa.load(audio_file, sr=16000)
@@ -61,16 +49,3 @@ predicted_ids = torch.argmax(logits, dim=-1)
 print("Prediction:", processor.batch_decode(predicted_ids))
 ```
-## Evaluation
-According to a balanced English+Welsh test set derived from Common Voice version 11, the WER of techiaith/wav2vec2-xlsr-ft-en-cy is **17.7%**
-However, when evaluated with language specific test sets, the model exhibits a bias to perform better with Welsh.
-| Common Voice Test Set Language | WER | CER |
-| -------- | --- | --- |
-| EN+CY | 17.07| 7.32  |
-| EN | 27.54  | 11.6  |
-| CY | 7.13  | 2.2  |

 ---
+language:
+- cy
+- en
 datasets:
+- techiaith/banc-trawsgrifiadau-bangor
+- techiaith/commonvoice_16_1_en_cy
 metrics:
 - wer
 tags:
 - automatic-speech-recognition
 - speech
 license: apache-2.0
+pipeline_tag: automatic-speech-recognition
 ---
+# wav2vec2-xlsr-ft-cy-en
+An acoustic encoder model for Welsh and English speech recognition, fine-tuned from
+[facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) using transcribed
+spontaneous speech from
+[techiaith/banc-trawsgrifiadau-bangor (v24.01)](https://huggingface.co/datasets/techiaith/banc-trawsgrifiadau-bangor/tree/24.01)
+as well as Welsh and English speech data derived from version 16.1 the Common Voice datasets [techiaith/commonvoice_16_1_en_cy](https://huggingface.co/datasets/techiaith/commonvoice_16_1_en_cy)
 ## Usage
+The wav2vec2-xlsr-ft-cy-en model can be used directly as follows:
 ```python
 import torch
 from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
+processor = Wav2Vec2Processor.from_pretrained("techiaith/wav2vec2-xlsr-ft-cy-en")
+model = Wav2Vec2ForCTC.from_pretrained("techiaith/wav2vec2-xlsr-ft-cy-en")
 audio, rate = librosa.load(audio_file, sr=16000)
 print("Prediction:", processor.batch_decode(predicted_ids))
 ```