whisper-base · FP32 ONNX — Uzbek Speech Recognition
ONNX export of Saidakbar01/whisper-base-uz-finetuned, a Whisper-base model fine-tuned for the Uzbek language.
Model Details
| Property | Value |
|---|---|
| Base architecture | openai/whisper-base |
| Fine-tuned model | Saidakbar01/whisper-base-uz-finetuned |
| Format | ONNX (FP32) |
| Size | 666 MB |
| WER (200-sample validation) | 48.08% |
| Language | Uzbek (uz) |
| Task | Automatic Speech Recognition |
Files
encoder_model.onnxdecoder_model.onnxdecoder_with_past_model.onnx
Usage
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
from transformers import WhisperProcessor
import torch
# Load processor and ONNX model
processor = WhisperProcessor.from_pretrained(
"openai/whisper-base",
language="uz",
task="transcribe",
)
model = ORTModelForSpeechSeq2Seq.from_pretrained(
"Saidakbar01/whisper-base-uz-onnx-fp32",
encoder_file_name="encoder_model.onnx",
decoder_file_name="decoder_model.onnx",
decoder_with_past_file_name="decoder_with_past_model.onnx",
)
# Transcribe audio (numpy float32 array at 16 kHz)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
predicted_ids = model.generate(
inputs.input_features,
language="uz",
task="transcribe",
)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Training & Evaluation
The parent model was fine-tuned on the DavronSherbaev/uzbekvoice-filtered dataset.
WER was measured on 200 samples from the validation split using jiwer.
| Model | WER |
|---|---|
| openai/whisper-base (baseline) | 200.69% |
| Saidakbar01/whisper-base-uz-finetuned (fine-tuned) | 53.13% |
| whisper-base-uz-onnx-fp32 (this model) | 48.08% |
- Downloads last month
- 2