Edit model card

datasets:

  • mozilla-foundation/common_voice_15_0
  • mozilla-foundation/common_voice_13_0 language:
  • hi metrics:
  • cer
  • wer library_name: transformers pipeline_tag: automatic-speech-recognition model-index:
    • name: whisper-small-hi-cv results:
      • task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 15 type: mozilla-foundation/common_voice_15_0 args: hi metrics:

        • name: Test WER type: wer value: 13.9913
        • name: Test CER type: cer value: 5.8844
      • task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 13 type: mozilla-foundation/common_voice_13_0 args: hi metrics:

        • name: Test WER type: wer value: 23.3824
        • name: Test CER type: cer value: 10.5288

Model Details

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on this dataset . It achieves the following results on the evaluation set:

  • Loss: 0.3691
  • Wer: 0.3285
  • Cer: 0.0875

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 300
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Wer Cer
7.314 19.05 300 3.4661 1.0 1.0
2.5698 38.1 600 0.6577 0.5203 0.1466
0.6112 57.14 900 0.4048 0.3723 0.1005
0.3826 76.19 1200 0.3778 0.3386 0.0901
0.3168 95.24 1500 0.3691 0.3285 0.0875

Framework versions

  • Transformers 4.33.0
  • Pytorch 2.0.0
  • Datasets 2.1.0
  • Tokenizers 0.13.3

SPACE

Automatic Speech Recognization in hindi

Downloads last month
30

Finetuned from

Datasets used to train SakshiRathi77/wav2vec2-large-xlsr-300m-hi-kagglex

Space using SakshiRathi77/wav2vec2-large-xlsr-300m-hi-kagglex 1