jonatasgrosman commited on
Commit
471b750
1 Parent(s): ddf9040

update README

Browse files
Files changed (2) hide show
  1. README.md +47 -8
  2. eval.txt +0 -5
README.md CHANGED
@@ -19,7 +19,7 @@ model-index:
19
  name: mozilla-foundation/common_voice_11_0 pt
20
  type: mozilla-foundation/common_voice_11_0
21
  config: pt
22
- split: validation[:1000]
23
  args: pt
24
  metrics:
25
  - name: WER
@@ -30,12 +30,51 @@ model-index:
30
  value: 1.6052355927195898
31
  ---
32
 
33
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
34
- should probably proofread and complete it, then remove this comment. -->
35
-
36
  # Whisper Large Portuguese
37
 
38
- This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on the mozilla-foundation/common_voice_11_0 pt dataset.
39
- It achieves the following results on the evaluation set:
40
- - WER: 4.816664144852979
41
- - CER: 1.6052355927195898
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  name: mozilla-foundation/common_voice_11_0 pt
20
  type: mozilla-foundation/common_voice_11_0
21
  config: pt
22
+ split: test
23
  args: pt
24
  metrics:
25
  - name: WER
30
  value: 1.6052355927195898
31
  ---
32
 
 
 
 
33
  # Whisper Large Portuguese
34
 
35
+ This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on Portuguese using the train and validation splits of [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0). Not all validation split data were used during training, I extracted 1k samples from the validation split to be used for evaluation during fine-tuning. When using this model, make sure that your speech input is sampled at 16kHz.
36
+
37
+
38
+ ## Usage
39
+
40
+ ```python
41
+
42
+ from transformers import pipeline
43
+
44
+ transcriber = pipeline(
45
+ "automatic-speech-recognition",
46
+ model="jonatasgrosman/whisper-large-pt-cv11"
47
+ )
48
+
49
+ transcriber.model.config.forced_decoder_ids = (
50
+ transcriber.tokenizer.get_decoder_prompt_ids(
51
+ language="pt"
52
+ task="transcribe"
53
+ )
54
+ )
55
+
56
+ transcription = transcriber("path/to/my_audio.wav")
57
+
58
+ ```
59
+
60
+ ## Evaluation
61
+
62
+ ### Common Voice 11
63
+
64
+ | | CER | WER |
65
+ | --- | --- | --- |
66
+ | [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) | 2.52 | 9.56 |
67
+ | [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) + text normalization | 1.60 | 4.82 |
68
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 4.32 | 13.92 |
69
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization | 2.84 | 7.02 |
70
+
71
+ ### Fleurs
72
+
73
+ | | CER | WER |
74
+ | --- | --- | --- |
75
+ | [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) | 4.88 | 12.08 |
76
+ | [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) + text normalization | 5.46 | 8.57 |
77
+ | [jonatasgrosman/whisper-large-pt-cv11](https://huggingface.co/jonatasgrosman/whisper-large-pt-cv11) + text normalization + samples with numbers removal | 3.36 | 6.05 |
78
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 3.52 | 10.55 |
79
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization | 4.19 | 7.04 |
80
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization + samples with numbers removal | 3.56 | 6.15 |
eval.txt DELETED
@@ -1,5 +0,0 @@
1
- preprocessing data...
2
- Transcribing...
3
- Evaluating...
4
- WER: 4.816664144852979
5
- CER: 1.6052355927195898