Edit model card

dgx2_w2v2_large_distill_noisy_teacher_mozilla_epochs_50_batch_16

This model is a fine-tuned version of rohitp1/kkkh_w2v2_large_finetune_teacher_babble_noise_mozilla_50_epochs_batch_16 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 21652.1836
  • Wer: 0.2592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 256
  • total_train_batch_size: 4096
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
74637.3574 7.31 250 4331.1958 0.2791
75858.376 14.63 500 7166.9727 0.2759
76494.272 21.94 750 9417.4209 0.2713
76375.128 29.26 1000 13408.2549 0.2680
74149.512 36.57 1250 14529.0449 0.2657
73472.352 43.89 1500 14684.6582 0.2643
72301.832 51.2 1750 15828.4707 0.2634
71340.256 58.51 2000 17094.2773 0.2614
71890.376 65.83 2250 17973.5566 0.2604
71789.656 73.14 2500 19330.4316 0.2599
71579.512 80.46 2750 19927.2129 0.2599
71862.48 87.77 3000 21301.7754 0.2592
71131.112 95.09 3250 21652.1836 0.2592

Framework versions

  • Transformers 4.29.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.8.0
  • Tokenizers 0.13.2
Downloads last month
2