whisper-large-v3-te

Telugu ASR model fine-tuned from openai/whisper-large-v3 by Liodon AI.

Note: This is the epoch 1 checkpoint (training complete). Further training with additional data is in progress — this model will be updated with improved checkpoints as training continues.

Training Data

~119K Telugu audio samples from three datasets:

Dataset Split Size
ai4bharat/Kathbath train ~70K
ai4bharat/indicvoices_r train ~47K
google/fleurs (te_in) train ~2K

Training Details

  • Base model: openai/whisper-large-v3
  • Hardware: NVIDIA GB10 (Grace Hopper), 128GB unified memory
  • Batch size: 16
  • Learning rate: 1e-5
  • Precision: bf16
  • Epochs: 1

WER Progress (Kathbath valid)

Epoch WER
0.10 48.79%
0.20 43.35%
0.30 40.78%
0.40 39.61%
0.50 38.96%
0.60 38.51%
0.70 38.31%
0.80 38.17%
0.90 38.16%
1.00 38.15%

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch

model = WhisperForConditionalGeneration.from_pretrained(
    "liodon-ai/whisper-large-v3-te",
    torch_dtype=torch.float16,
)
processor = WhisperProcessor.from_pretrained(
    "liodon-ai/whisper-large-v3-te",
    language="Telugu",
    task="transcribe",
)

# Load your audio (must be 16kHz mono)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    predicted_ids = model.generate(inputs["input_features"])
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

License

Apache 2.0

Downloads last month
83
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train liodon-ai/whisper-large-v3-te

Space using liodon-ai/whisper-large-v3-te 1

Evaluation results