whisper-base-gl / README.md
rgomez-itg's picture
Update README.md
edc2869
|
raw
history blame
3.08 kB
metadata
license: cc-by-nc-nd-4.0
datasets:
  - openslr
language:
  - gl
pipeline_tag: automatic-speech-recognition
tags:
  - ITG
  - PyTorch
  - Transformers
  - whisper
  - whisper-base

Whisper Base Galician

Description

This is a fine-tuned version of the openai/whisper-base pre-trained model for ASR in galician.


Dataset

We used one of the datasets available in the openslr repository, the OpenSLR galician.


Example inference script

Check this example script to run our model in inference mode

import torch
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

filename = "demo.wav"  #change this line to the name of your audio file
sample_rate = 16_000   
processor = AutoProcessor.from_pretrained('ITG/whisper-base-gl')
model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-base-gl')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

with torch.no_grad():
  speech_array, _ = librosa.load(filename, sr=sample_rate)
  inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device)
  input_features = inputs.input_features
  generated_ids = model.generate(inputs=input_features, max_length=225)
  decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"ASR Galician whisper-base output: {decode_output}")

Fine-tuning hyper-parameters

Hyper-parameter Value
Training batch size 16
Evaluation batch size 8
Learning rate 3e-5
Gradient checkpointing true
Gradient accumulation steps 1
Max training epochs 100
Max steps 4000
Generate max length 225
Warmup training steps (%) 12,5%
FP16 true
Metric for best model wer
Greater is better false

Fine-tuning in a different dataset or style

If you're interested in fine-tuning your own whisper model, we suggest starting with the openai/whisper-base model. Additionally, you may find the Transformers step-by-step guide for fine-tuning whisper on multilingual ASR datasets to be a valuable resource. This guide served as a helpful reference during the training process of this Galician whisper-base model!