whisper-large-v3-te

Telugu ASR model fine-tuned from openai/whisper-large-v3 by Liodon AI.

Note: This is the epoch 1 checkpoint (training complete). Further training with additional data is in progress — this model will be updated with improved checkpoints as training continues.

Training Data

~119K Telugu audio samples from three datasets:

Dataset	Split	Size
ai4bharat/Kathbath	train	~70K
ai4bharat/indicvoices_r	train	~47K
google/fleurs (te_in)	train	~2K

Training Details

Base model: openai/whisper-large-v3
Hardware: NVIDIA GB10 (Grace Hopper), 128GB unified memory
Batch size: 16
Learning rate: 1e-5
Precision: bf16
Epochs: 1

WER Progress (Kathbath valid)

Epoch	WER
0.10	48.79%
0.20	43.35%
0.30	40.78%
0.40	39.61%
0.50	38.96%
0.60	38.51%
0.70	38.31%
0.80	38.17%
0.90	38.16%
1.00	38.15%

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch

model = WhisperForConditionalGeneration.from_pretrained(
    "liodon-ai/whisper-large-v3-te",
    torch_dtype=torch.float16,
)
processor = WhisperProcessor.from_pretrained(
    "liodon-ai/whisper-large-v3-te",
    language="Telugu",
    task="transcribe",
)

# Load your audio (must be 16kHz mono)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    predicted_ids = model.generate(inputs["input_features"])
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

License

Apache 2.0

Downloads last month: 83

Safetensors

Model size

2B params

Tensor type

BF16

Datasets used to train liodon-ai/whisper-large-v3-te

Space using liodon-ai/whisper-large-v3-te 1

Evaluation results

WER on Kathbath (Telugu validation)
self-reported

38.150