metadata
license: mit
language:
- pt
base_model:
- distil-whisper/distil-large-v3
pipeline_tag: automatic-speech-recognition
tags:
- asr
- pt
- ptbr
- stt
- speech-to-text
- automatic-speech-recognition
Distil-Whisper-Large-v3 for Brazilian Portuguese
This model is a fine-tuned version of distil-whisper-large-v3 for automatic speech recognition (ASR) in Brazilian Portuguese. It was trained using the Common Voice 16 dataset in conjunction with a private dataset transcribed using Whisper Large v3.
Model Description
The model aims to perform automatic speech transcription in Brazilian Portuguese with high accuracy. By combining data from Common Voice 16 with an automatically transcribed private dataset, the model achieved a Word Error Rate (WER) of 8.93% on the validation set of Common Voice 16.
- Model type: Speech recognition model based on distil-whisper-large-v3
- Language(s) (NLP): Brazilian Portuguese (pt-BR)
- License: MIT
- Finetuned from model [optional]: distil-whisper/distil-large-v3
How to Get Started with the Model
You can use the model with the Transformers library: from transformers import WhisperForConditionalGeneration, WhisperProcessor
from datasets import load_dataset
from transformers import WhisperProcessor, WhisperForConditionalGeneration
# Load the validation split of the Common Voice dataset for Portuguese
common_voice = load_dataset("mozilla-foundation/common_voice_11_0", "pt", split="validation")
# Load the pretrained model and processor
processor = WhisperProcessor.from_pretrained("freds0/distil-whisper-large-v3-ptbr")
model = WhisperForConditionalGeneration.from_pretrained("freds0/distil-whisper-large-v3-ptbr")
# Select a sample from the dataset
sample = common_voice[0] # You can change the index to select a different sample
# Get the audio array and sampling rate
audio_input = sample["audio"]["array"]
sampling_rate = sample["audio"]["sampling_rate"]
# Preprocess the audio
input_features = processor(audio_input, sampling_rate=sampling_rate, return_tensors="pt").input_features
# Generate transcription
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print("Transcription:", transcription[0])