jlondonobo
/

whisper-medium-pt

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

whisper-medium-pt / README.md

jlondonobo's picture

fix typos

2111b26 over 1 year ago

|

No virus

3.39 kB

	---
	language:
	- pt
	license: apache-2.0
	tags:
	- whisper-event
	- generated_from_trainer
	datasets:
	- mozilla-foundation/common_voice_11_0
	metrics:
	- wer
	model-index:
	- name: Whisper Medium Portuguese
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: mozilla-foundation/common_voice_11_0 pt
	type: mozilla-foundation/common_voice_11_0
	config: pt
	split: test
	args: pt
	metrics:
	- name: Wer
	type: wer
	value: 6.5785713084850626
	---

	# Whisper Medium Portuguese 🇧🇷🇵🇹

	Bem-vindo ao whisper medium para transcrição em português 👋🏻

	If you are looking to quickly, and reliably, transcribe Portuguese audio to text, you are in the right place!

	With a state-of-the-art [Word Error Rate](https://huggingface.co/spaces/evaluate-metric/wer) (WER) of just 6.579 in Common Voice 11, this model offers an x2 precision increase compared to prior state-of-the-art [wav2vec2](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese) models. Compared to the original [whisper-medium](https://huggingface.co/openai/whisper-medium) model it delivers an x1.2 improvement 🚀.

	This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the [mozilla-foundation/common_voice_11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) dataset.

	The following table displays a comparison between the results of our model and those achieved by the most downloaded models in the hub for [Portuguese Automatic Speech Recognition](https://huggingface.co/models?language=pt&pipeline_tag=automatic-speech-recognition&sort=downloads) 🗣:

	\| Model \| WER \| Parameters \|
	\|--------------------------------------------------\|:--------:\|:------------:\|
	\| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) \| 8.100 \| 769M \|
	\| [jlondonobo/whisper-medium-pt](https://huggingface.co/jlondonobo/whisper-medium-pt) \| 6.579 🤗 \| 769M \|
	\| [jonatasgrosman/wav2vec2-large-xlsr-53-portuguese](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-portuguese) \| 11.310 \| 317M \|
	\| [Edresson/wav2vec2-large-xlsr-coraa-portuguese](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese) \| 20.080 \| 317M \|


	### Training hyperparameters
	We used the following hyperparameters for training:
	- `learning_rate`: 1e-05
	- `train_batch_size`: 32
	- `eval_batch_size`: 16
	- `seed`: 42
	- `optimizer`: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- `lr_scheduler_type`: linear
	- `lr_scheduler_warmup_steps`: 500
	- `training_steps`: 5000
	- `mixed_precision_training`: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|
	\| 0.0698 \| 1.09 \| 1000 \| 0.1876 \| 7.189 \|
	\| 0.0218 \| 3.07 \| 2000 \| 0.2254 \| 7.110 \|
	\| 0.0053 \| 5.06 \| 3000 \| 0.2711 \| 6.969 \|
	\| 0.0017 \| 7.04 \| 4000 \| 0.3030 \| 6.686 \|
	\| 0.0005 \| 9.02 \| 5000 \| 0.3205 \| 6.579 🤗 \|


	### Framework versions

	- Transformers 4.26.0.dev0
	- Pytorch 1.13.0+cu117
	- Datasets 2.7.1.dev0
	- Tokenizers 0.13.2