Edit model card

🤗 HF Repo •🐱 Github Repo

Usage

import torch
import librosa
from transformers import WhisperProcessor, WhisperTokenizer, WhisperForConditionalGeneration

# Setups
model_name_or_path = "Oblivion208/whisper-tiny-cantonese"
task = "transcribe"
device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = WhisperForConditionalGeneration.from_pretrained(model_name_or_path).to(device)
tokenizer = WhisperTokenizer.from_pretrained(model_name_or_path, task=task)
processor = WhisperProcessor.from_pretrained(model_name_or_path, task=task)
feature_extractor = processor.feature_extractor
model.config.forced_decoder_ids = None
model.config.suppress_tokens = []

filepath = 'test.wav'
audio, sr = librosa.load(filepath, sr=16000, mono=True)
inputs = processor(audio, sample_rate=sr, return_tensors="pt").to(device)

with torch.inference_mode():
    generated_tokens = model.generate(
        input_features=inputs.input_features,
        return_dict_in_generate=True,
        max_new_tokens=255,
    )
    transcription = tokenizer.batch_decode(
        generated_tokens.sequences, skip_special_tokens=True)
    print(transcription)

Approximate Performance Evaluation

The following models are all trained and evaluated on a single RTX 3090 GPU.

Cantonese Test Results Comparison

MDCC

Model name Parameters Finetune Steps Time Spend Training Loss Validation Loss CER % Finetuned Model
whisper-tiny-cantonese 39 M 3200 4h 34m 0.0485 0.771 11.10 Link
whisper-base-cantonese 74 M 7200 13h 32m 0.0186 0.477 7.66 Link
whisper-small-cantonese 244 M 3600 6h 38m 0.0266 0.137 6.16 Link
whisper-small-lora-cantonese 3.5 M 8000 21h 27m 0.0687 0.382 7.40 Link
whisper-large-v2-lora-cantonese 15 M 10000 33h 40m 0.0046 0.277 3.77 Link

Common Voice Corpus 11.0

Model name Original CER % w/o Finetune CER % Jointly Finetune CER %
whisper-tiny-cantonese 124.03 66.85 35.87
whisper-base-cantonese 78.24 61.42 16.73
whisper-small-cantonese 52.83 31.23 /
whisper-small-lora-cantonese 37.53 19.38 14.73
whisper-large-v2-lora-cantonese 37.53 19.38 9.63
Downloads last month
5
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Oblivion208/whisper-tiny-cantonese