Edit model card

whisper-tiny-finetune-hindi-fleurs

This model is a fine-tuned version of openai/whisper-tiny on the google/fleurs dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8315
  • Wer Ortho: 0.4313
  • Wer: 0.4262

A working Hugging Face Space can be found here

Model description

This model is a fine-tuned version of openai/whisper-tiny on the google/fleurs dataset. It improves the WER from 102.3 as stated in the Whisper Paper to 0.42 on the Hindi Subset of google/fleurs

Intended uses & limitations

This model is intended to be used on Edge Low Compute Devices such as the Raspbery Pi Pico/3/3B/4 and offers real time transcription of Hindi audio into the English Lexicon.

Training and evaluation data

The model was trained on google/fleurs's hi_in Subset and used WER as the evaluation criteria

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_steps: 50
  • training_steps: 500
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Ortho Wer
1.8112 1.39 100 1.7274 0.6323 0.6258
1.0387 2.78 200 1.1194 0.5130 0.5072
0.7671 4.17 300 0.9671 0.4665 0.4613
0.5283 5.56 400 0.8840 0.4494 0.4440
0.4458 6.94 500 0.8315 0.4313 0.4262

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.0
  • Tokenizers 0.15.0

Citations

@inproceedings{Bhat:2014:ISS:2824864.2824872,
 author = {Bhat, Irshad Ahmad and Mujadia, Vandan and Tammewar, Aniruddha and Bhat, Riyaz Ahmad and Shrivastava, Manish},
 title = {IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search},
 booktitle = {Proceedings of the Forum for Information Retrieval Evaluation},
 series = {FIRE '14},
 year = {2015},
 isbn = {978-1-4503-3755-7},
 location = {Bangalore, India},
 pages = {48--53},
 numpages = {6},
 url = {http://doi.acm.org/10.1145/2824864.2824872},
 doi = {10.1145/2824864.2824872},
 acmid = {2824872},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {Information Retrieval, Language Identification, Language Modeling, Perplexity, Transliteration},
}
@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}
Downloads last month
0
Safetensors
Model size
37.8M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train Aryan-401/whisper-tiny-finetune-hindi-fleurs

Space using Aryan-401/whisper-tiny-finetune-hindi-fleurs 1

Evaluation results