File size: 3,394 Bytes
1a9df55 673eaaf 1a9df55 673eaaf 1a9df55 673eaaf 1a9df55 673eaaf 1a9df55 673eaaf 1a9df55 469502d 1a9df55 469502d 1a9df55 2111b26 1a9df55 2111b26 1a9df55 469502d 1a9df55 2111b26 1a9df55 469502d 2111b26 1a9df55 469502d 1a9df55 2111b26 1a9df55 469502d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
language:
- pt
license: apache-2.0
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
metrics:
- wer
model-index:
- name: Whisper Medium Portuguese
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: mozilla-foundation/common_voice_11_0 pt
type: mozilla-foundation/common_voice_11_0
config: pt
split: test
args: pt
metrics:
- name: Wer
type: wer
value: 6.5785713084850626
---
# Whisper Medium Portuguese 🇧🇷🇵🇹
Bem-vindo ao whisper medium para transcrição em português 👋🏻
If you are looking to **quickly**, and **reliably**, transcribe Portuguese audio to text, you are in the right place!
With a state-of-the-art [Word Error Rate](https://huggingface.co/spaces/evaluate-metric/wer) (WER) of just **6.579** in Common Voice 11, this model offers an **x2** precision increase compared to prior state-of-the-art [wav2vec2](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese) models. Compared to the original [whisper-medium](https://huggingface.co/openai/whisper-medium) model it delivers an **x1.2** improvement 🚀.
This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the [mozilla-foundation/common_voice_11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) dataset.
The following table displays a **comparison** between the results of our model and those achieved by the most downloaded models in the hub for [Portuguese Automatic Speech Recognition](https://huggingface.co/models?language=pt&pipeline_tag=automatic-speech-recognition&sort=downloads) 🗣:
| Model | WER | Parameters |
|--------------------------------------------------|:--------:|:------------:|
| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 8.100 | 769M |
| [jlondonobo/whisper-medium-pt](https://huggingface.co/jlondonobo/whisper-medium-pt) | **6.579** 🤗 | 769M |
| [jonatasgrosman/wav2vec2-large-xlsr-53-portuguese](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-portuguese) | 11.310 | 317M |
| [Edresson/wav2vec2-large-xlsr-coraa-portuguese](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese) | 20.080 | 317M |
### Training hyperparameters
We used the following hyperparameters for training:
- `learning_rate`: 1e-05
- `train_batch_size`: 32
- `eval_batch_size`: 16
- `seed`: 42
- `optimizer`: Adam with betas=(0.9,0.999) and epsilon=1e-08
- `lr_scheduler_type`: linear
- `lr_scheduler_warmup_steps`: 500
- `training_steps`: 5000
- `mixed_precision_training`: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|:-------------:|:-----:|:----:|:---------------:|:------:|
| 0.0698 | 1.09 | 1000 | 0.1876 | 7.189 |
| 0.0218 | 3.07 | 2000 | 0.2254 | 7.110 |
| 0.0053 | 5.06 | 3000 | 0.2711 | 6.969 |
| 0.0017 | 7.04 | 4000 | 0.3030 | 6.686 |
| 0.0005 | 9.02 | 5000 | 0.3205 | **6.579** 🤗 |
### Framework versions
- Transformers 4.26.0.dev0
- Pytorch 1.13.0+cu117
- Datasets 2.7.1.dev0
- Tokenizers 0.13.2 |