Indic CrisperWhisper v1 (Hindi)

Verbatim Hindi ASR with word-level timestamps. Fine-tuned from vasista22/whisper-hindi-large-v2 on IndicVoices-R Hindi using the CrisperWhisper dual-loss approach.

Results (IndicVoices-R Hindi test set, 376 samples)

Model	WER vs Normalised	WER vs Verbatim
Base (vasista22/whisper-hindi-large-v2)	23.52%	25.91%
Indic CrisperWhisper v1	18.86%	18.60%

Usage

pip install torch torchaudio "transformers>=4.37" huggingface_hub numpy
python transcribe_hindi.py audio.wav --model_dir user71/indic-crisperwhisper-hindi-v1

from transcribe_hindi import load_model, transcribe

pipe   = load_model("user71/indic-crisperwhisper-hindi-v1")
result = transcribe("audio.wav", pipe=pipe)

print(result["text"])
for chunk in result["chunks"]:
    word  = chunk["text"]
    start = chunk["timestamp"][0]
    end   = chunk["timestamp"][1]
    print(f"{word:<20}  {start:.3f}s – {end:.3f}s")

Output includes Hindi filler tokens: [UM], [UH], [FILLER], [PAUSE].

Files

File	Description
`model-*.safetensors`	Model weights (Whisper large-v2 based, 1.5B params)
`retok_config.json`	Retokenisation config (filler token IDs, vocab size=50368)
`alignment_heads.json`	15 selected cross-attention heads used for DTW timestamps

Citation

Built on CrisperWhisper (Kohler et al., INTERSPEECH 2024) Training data: IndicVoices-R (Sangroya et al., 2024)

Use this citation to cite us:

Sanat Kumar Agarwal and Srikanth Raj Chetupalli, "IndicCrisperWhisper-Time-stamped transcription for IndicVocies using CrisperWhisper", , May 2026.

Downloads last month: 90

Safetensors

Model size

2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for user71/indic-crisperwhisper-hindi-v1

CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions

Paper • 2408.16589 • Published Aug 29, 2024 • 2

IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages

Paper • 2403.01926 • Published Mar 4, 2024 • 1