Edit model card

libri-alpha-0.75-Temp-1-attention-3-layers-distil-with-6-layers-loss-att-take-2

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 26.4101
  • Wer: 0.2791

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 16
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 40
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
202.4293 0.45 200 26.7777 0.2779
197.6471 0.9 400 25.8300 0.2760
204.8931 1.35 600 25.6774 0.2747
193.3182 1.79 800 25.6049 0.2737
205.2241 2.24 1000 25.5552 0.2739
186.0407 2.69 1200 25.4364 0.2737
191.7055 3.14 1400 25.7949 0.2764
185.0721 3.59 1600 26.1202 0.2753
198.8579 4.04 1800 25.8496 0.2763
185.7877 4.48 2000 27.0753 0.2731
194.9394 4.93 2200 25.6920 0.2775
188.2296 5.38 2400 25.7362 0.2742
188.0202 5.83 2600 25.9170 0.2755
191.5541 6.28 2800 26.8590 0.2771
198.2817 6.73 3000 26.4101 0.2791

Framework versions

  • Transformers 4.24.0
  • Pytorch 1.12.1
  • Datasets 2.7.0
  • Tokenizers 0.11.0
Downloads last month
8