metadata

language:
  - pt
license: apache-2.0
tags:
  - whisper-event
  - generated_from_trainer
datasets:
  - mozilla-foundation/common_voice_11_0
metrics:
  - wer
model-index:
  - name: Whisper Large Portuguese
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: mozilla-foundation/common_voice_11_0 pt
          type: mozilla-foundation/common_voice_11_0
          config: pt
          split: test
          args: pt
        metrics:
          - name: WER
            type: wer
            value: 4.816664144852979
          - name: CER
            type: cer
            value: 1.6052355927195898

Whisper Large Portuguese

This model is a fine-tuned version of openai/whisper-large-v2 on Portuguese using the train and validation splits of Common Voice 11. Not all validation split data were used during training, I extracted 1k samples from the validation split to be used for evaluation during fine-tuning. When using this model, make sure that your speech input is sampled at 16kHz.

Usage


from transformers import pipeline

transcriber = pipeline(
  "automatic-speech-recognition", 
  model="jonatasgrosman/whisper-large-pt-cv11"
)

transcriber.model.config.forced_decoder_ids = (
  transcriber.tokenizer.get_decoder_prompt_ids(
    language="pt" 
    task="transcribe"
  )
)

transcription = transcriber("path/to/my_audio.wav")

Evaluation

Common Voice 11

	CER	WER
jonatasgrosman/whisper-large-pt-cv11	2.52	9.56
jonatasgrosman/whisper-large-pt-cv11 + text normalization	1.60	4.82
openai/whisper-large-v2	4.32	13.92
openai/whisper-large-v2 + text normalization	2.84	7.02

Fleurs

	CER	WER
jonatasgrosman/whisper-large-pt-cv11	4.88	12.08
jonatasgrosman/whisper-large-pt-cv11 + text normalization	5.46	8.57
jonatasgrosman/whisper-large-pt-cv11 + text normalization + removal of samples with numbers	3.36	6.05
openai/whisper-large-v2	3.52	10.55
openai/whisper-large-v2 + text normalization	4.19	7.04
openai/whisper-large-v2 + text normalization + removal of samples with numbers	3.56	6.15