whisper-base · FP32 ONNX — Uzbek Speech Recognition

ONNX export of Saidakbar01/whisper-base-uz-finetuned, a Whisper-base model fine-tuned for the Uzbek language.

Model Details

Property Value
Base architecture openai/whisper-base
Fine-tuned model Saidakbar01/whisper-base-uz-finetuned
Format ONNX (FP32)
Size 666 MB
WER (200-sample validation) 48.08%
Language Uzbek (uz)
Task Automatic Speech Recognition

Files

  • encoder_model.onnx
  • decoder_model.onnx
  • decoder_with_past_model.onnx

Usage

from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
from transformers import WhisperProcessor
import torch

# Load processor and ONNX model
processor = WhisperProcessor.from_pretrained(
    "openai/whisper-base",
    language="uz",
    task="transcribe",
)
model = ORTModelForSpeechSeq2Seq.from_pretrained(
    "Saidakbar01/whisper-base-uz-onnx-fp32",
    encoder_file_name="encoder_model.onnx",
    decoder_file_name="decoder_model.onnx",
    decoder_with_past_file_name="decoder_with_past_model.onnx",
)

# Transcribe audio (numpy float32 array at 16 kHz)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    predicted_ids = model.generate(
        inputs.input_features,
        language="uz",
        task="transcribe",
    )
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Training & Evaluation

The parent model was fine-tuned on the DavronSherbaev/uzbekvoice-filtered dataset. WER was measured on 200 samples from the validation split using jiwer.

Model WER
openai/whisper-base (baseline) 200.69%
Saidakbar01/whisper-base-uz-finetuned (fine-tuned) 53.13%
whisper-base-uz-onnx-fp32 (this model) 48.08%
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support