xekri
/

wav2vec2-common_voice_13_0-eo-10

@@ -1,16 +1,11 @@
 ---
-language:
-- eo
 license: apache-2.0
 tags:
-- automatic-speech-recognition
-- mozilla-foundation/common_voice_13_0
 - generated_from_trainer
 datasets:
 - common_voice_13_0
 metrics:
 - wer
-- cer
 model-index:
 - name: wav2vec2-common_voice_13_0-eo-10
   results:
@@ -18,68 +13,42 @@ model-index:
       name: Automatic Speech Recognition
       type: automatic-speech-recognition
     dataset:
-      name: mozilla-foundation/common_voice_13_0
       type: common_voice_13_0
       config: eo
       split: validation
-      args: 'Config: eo, Training split: train, Eval split: validation'
     metrics:
-    - name: WER
       type: wer
-      value: 0.0656526475637132
-    - name: CER
-      type: cer
-      value: 0.0118
 ---
-# wav2vec2-common_voice_13_0-eo-10, an Esperanto speech recognizer
-This model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on the [mozilla-foundation/common_voice_13_0](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0) Esperanto dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0453
-- Cer: 0.0118
-- Wer: 0.0657
-The first 10 examples in the evaluation set:
-| Actual<br>Predicted | CER |
-|:--------------------|:----|
-| `la orienta parto apud benino kaj niĝerio estis nomita sklavmarbordo`<br>`la orienta parto apud benino kaj niĝerio estis nomita sklafmarbordo` | 0.014925373134328358 |
-| `en la sekva jaro li ricevis premion`<br>`en la sekva jaro li ricevis premion` | 0.0 |
-| `ŝi studis historion ĉe la universitato de brita kolumbio`<br>`ŝi studis historion ĉe la universitato de brita kolumbio` | 0.0 |
-| `larĝaj ŝtupoj kuras al la fasado`<br>`larĝaj ŝtupoj kuras al la fasado` | 0.0 |
-| `la municipo ĝuas duan epokon de etendo kaj disvolviĝo`<br>`la municipo ĝuas duan eepokon de etendo kaj disvolviĝo` | 0.018867924528301886 |
-| `li estis ankaŭ katedrestro kaj dekano`<br>`li estis ankaŭ katedristo kaj dekano` | 0.05405405405405406 |
-| `librovendejo apartenas al la muzeo`<br>`librovendejo apartenas al la muzeo` | 0.0 |
-| `ĝi estas kutime malfacile videbla kaj troviĝas en subkreskaĵaro de arbaroj`<br>`ĝi estas kutime malfacile videbla kaj troviĝas en subkreskaĵo de arbaroj` | 0.02702702702702703 |
-| `unue ili estas ruĝaj poste brunaj`<br>`unue ili estas ruĝaj poste brunaj` | 0.0 |
-| `la loĝantaro laboras en la proksima ĉefurbo`<br>`la loĝantaro laboras en la proksima ĉefurbo` | 0.0 |
 ## Model description
-See [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53).
 ## Intended uses & limitations
-Speech recognition for Esperanto. The base model was pretrained and finetuned on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16KHz.
-The output is all lowercase, no punctuation.
 ## Training and evaluation data
-The training split was set to `train` while the eval split was set to `validation`. Some files were filtered out of the train and validation dataset due to bad data; see [xekri/wav2vec2-common_voice_13_0-eo-3](https://huggingface.co/xekri/wav2vec2-common_voice_13_0-eo-3) for a detailed discussion. In summary, I used `xekri/wav2vec2-common_voice_13_0-eo-3` as a detector to detect bad files, then hardcoded those files into the trainer code to be filtered out.
 ## Training procedure
-I used a modified version of [`run_speech_recognition_ctc.py`](https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition) for training. See [`run_speech_recognition_ctc.py`](https://huggingface.co/xekri/wav2vec2-common_voice_13_0-eo-10/blob/main/run_speech_recognition_ctc.py) in this repo.
-The parameters to the trainer are in [train.json](https://huggingface.co/xekri/wav2vec2-common_voice_13_0-eo-10/blob/main/train.json) in this repo.
-The key changes between this training run and `xekri/wav2vec2-common_voice_13_0-eo-3`, aside from the filtering and use of the full training and validation sets are:
-* Layer drop probability is 20%
-* Train only for 5 epochs
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -89,7 +58,6 @@ The following hyperparameters were used during training:
 - seed: 42
 - gradient_accumulation_steps: 2
 - total_train_batch_size: 32
-- layerdrop: 0.2
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 500

 ---
 license: apache-2.0
 tags:
 - generated_from_trainer
 datasets:
 - common_voice_13_0
 metrics:
 - wer
 model-index:
 - name: wav2vec2-common_voice_13_0-eo-10
   results:
       name: Automatic Speech Recognition
       type: automatic-speech-recognition
     dataset:
+      name: common_voice_13_0
       type: common_voice_13_0
       config: eo
       split: validation
+      args: eo
     metrics:
+    - name: Wer
       type: wer
+      value: 0.06575168361283507
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# wav2vec2-common_voice_13_0-eo-10
+This model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on the common_voice_13_0 dataset.
 It achieves the following results on the evaluation set:
+- Cer: 0.0119
+- Loss: 0.0454
+- Wer: 0.0658
 ## Model description
+More information needed
 ## Intended uses & limitations
+More information needed
 ## Training and evaluation data
+More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - seed: 42
 - gradient_accumulation_steps: 2
 - total_train_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 500