Whisper Base Galician

Description

This is a fine-tuned version of the openai/whisper-base pre-trained model for ASR in galician.

Dataset

We used one of the datasets available in the openslr repository, the OpenSLR galician.

Example inference script

Check this example script to run our model in inference mode

import torch
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

filename = "demo.wav"  #change this line to the name of your audio file
sample_rate = 16_000   
processor = AutoProcessor.from_pretrained('ITG/whisper-base-gl')
model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-base-gl')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

with torch.no_grad():
  speech_array, _ = librosa.load(filename, sr=sample_rate)
  inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device)
  input_features = inputs.input_features
  generated_ids = model.generate(inputs=input_features, max_length=225)
  decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"ASR Galician whisper-base output: {decode_output}")

Fine-tuning hyper-parameters

Hyper-parameter	Value
Training batch size	16
Evaluation batch size	8
Learning rate	3e-5
Gradient checkpointing	true
Gradient accumulation steps	1
Max training epochs	100
Max steps	4000
Generate max length	225
Warmup training steps (%)	12,5%
FP16	true
Metric for best model	wer
Greater is better	false

Fine-tuning in a different dataset or style

If you're interested in fine-tuning your own whisper model, we suggest starting with the openai/whisper-base model. Additionally, you may find the Transformers step-by-step guide for fine-tuning whisper on multilingual ASR datasets to be a valuable resource. This guide served as a helpful reference during the training process of this Galician whisper-base model!

ITG
/

whisper-base-gl