You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for Model ID

This model card describes a fine-tuned version of the Openai/Whisper-large-v3-turbo, optimized for Mandarin automatic speech recognition (ASR). It achieves the following results on the evaluation set:

  • Common Voice 13.0 dataset(test):
    Wer before fine-tune: 77.08
    Wer after fine-tune: 45.47
  • Common Voice 16.1 dataset(test):
    Wer before fine-tune: 77.57
    Wer after fine-tune: 45.9

Uses

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "sandy1990418/whisper-large-v3-turbo-zh-tw"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)

dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]

result = pipe(sample)
print(result["text"])
Downloads last month
39
Safetensors
Model size
809M params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for sandy1990418/whisper-large-v3-turbo-zh-tw

Finetuned
(169)
this model

Dataset used to train sandy1990418/whisper-large-v3-turbo-zh-tw