Aryan-401's picture
Update README.md
fc85b48
metadata
license: apache-2.0
base_model: openai/whisper-tiny
tags:
  - generated_from_trainer
datasets:
  - google/fleurs
metrics:
  - wer
model-index:
  - name: whisper-tiny-finetune-hindi-fleurs
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: google/fleurs
          type: google/fleurs
          config: hi_in
          split: train+test
          args: hi_in
        metrics:
          - name: Wer
            type: wer
            value: 0.42621638924455824
language:
  - hi

whisper-tiny-finetune-hindi-fleurs

This model is a fine-tuned version of openai/whisper-tiny on the google/fleurs dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8315
  • Wer Ortho: 0.4313
  • Wer: 0.4262

A working Hugging Face Space can be found here

Model description

This model is a fine-tuned version of openai/whisper-tiny on the google/fleurs dataset. It improves the WER from 102.3 as stated in the Whisper Paper to 0.42 on the Hindi Subset of google/fleurs

Intended uses & limitations

This model is intended to be used on Edge Low Compute Devices such as the Raspbery Pi Pico/3/3B/4 and offers real time transcription of Hindi audio into the English Lexicon.

Training and evaluation data

The model was trained on google/fleurs's hi_in Subset and used WER as the evaluation criteria

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_steps: 50
  • training_steps: 500
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Ortho Wer
1.8112 1.39 100 1.7274 0.6323 0.6258
1.0387 2.78 200 1.1194 0.5130 0.5072
0.7671 4.17 300 0.9671 0.4665 0.4613
0.5283 5.56 400 0.8840 0.4494 0.4440
0.4458 6.94 500 0.8315 0.4313 0.4262

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.0
  • Tokenizers 0.15.0

Citations

@inproceedings{Bhat:2014:ISS:2824864.2824872,
 author = {Bhat, Irshad Ahmad and Mujadia, Vandan and Tammewar, Aniruddha and Bhat, Riyaz Ahmad and Shrivastava, Manish},
 title = {IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search},
 booktitle = {Proceedings of the Forum for Information Retrieval Evaluation},
 series = {FIRE '14},
 year = {2015},
 isbn = {978-1-4503-3755-7},
 location = {Bangalore, India},
 pages = {48--53},
 numpages = {6},
 url = {http://doi.acm.org/10.1145/2824864.2824872},
 doi = {10.1145/2824864.2824872},
 acmid = {2824872},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {Information Retrieval, Language Identification, Language Modeling, Perplexity, Transliteration},
}
@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}