jonatasgrosman
/

whisper-large-pt-cv11

@@ -19,7 +19,7 @@ model-index:
       name: mozilla-foundation/common_voice_11_0 pt
       type: mozilla-foundation/common_voice_11_0
       config: pt
-      split: validation[:1000]
       args: pt
     metrics:
     - name: WER
@@ -30,12 +30,51 @@ model-index:
       value: 1.6052355927195898
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # Whisper Large Portuguese
-This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on the mozilla-foundation/common_voice_11_0 pt dataset.
-It achieves the following results on the evaluation set:
-- WER: 4.816664144852979
-- CER: 1.6052355927195898

       name: mozilla-foundation/common_voice_11_0 pt
       type: mozilla-foundation/common_voice_11_0
       config: pt
+      split: test
       args: pt
     metrics:
     - name: WER
       value: 1.6052355927195898
 ---
 # Whisper Large Portuguese
+This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on Portuguese using the train and validation splits of [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0). Not all validation split data were used during training, I extracted 1k samples from the validation split to be used for evaluation during fine-tuning. When using this model, make sure that your speech input is sampled at 16kHz.
+## Usage
+```python
+from transformers import pipeline
+transcriber = pipeline(
+  "automatic-speech-recognition",
+  model="jonatasgrosman/whisper-large-pt-cv11"
+)
+transcriber.model.config.forced_decoder_ids = (
+  transcriber.tokenizer.get_decoder_prompt_ids(
+    language="pt"
+    task="transcribe"
+  )
+)
+transcription = transcriber("path/to/my_audio.wav")
+```
+## Evaluation
+### Common Voice 11
+| | CER | WER |
+| --- | --- | --- |
+| [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) | 2.52 | 9.56 |
+| [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) + text normalization | 1.60 | 4.82 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 4.32 | 13.92 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization | 2.84 | 7.02 |
+### Fleurs
+| | CER | WER |
+| --- | --- | --- |
+| [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) | 4.88 | 12.08 |
+| [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) + text normalization | 5.46 | 8.57 |
+| [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) + text normalization + samples with numbers removal | 3.36 | 6.05 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 3.52 | 10.55 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization | 4.19 | 7.04 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization + samples with numbers removal | 3.56 | 6.15 |

eval.txt DELETED Viewed

@@ -1,5 +0,0 @@
-preprocessing data...
-Transcribing...
-Evaluating...
-WER: 4.816664144852979
-CER: 1.6052355927195898