Edit model card

libri-alpha-0.75-Temp-1-attention-3-layers-distil-with-6-layers-take-3

This model is a fine-tuned version of rohitp1/libri-alpha-0.75-Temp-1-attention-3-layers-distil-with-6-layers-take-2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 236.1198
  • Wer: 0.2607

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.3
  • num_epochs: 30
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
388.1305 0.45 200 228.0258 0.2599
376.7096 0.9 400 226.8922 0.2566
384.1615 1.35 600 228.5904 0.2571
373.8909 1.79 800 229.0286 0.2563
385.2149 2.24 1000 230.8802 0.2575
384.5473 2.69 1200 230.1264 0.2563
383.9426 3.14 1400 232.5964 0.2569
385.9253 3.59 1600 237.4036 0.2599
396.9868 4.04 1800 236.1198 0.2607

Framework versions

  • Transformers 4.23.1
  • Pytorch 1.12.1
  • Datasets 2.6.1
  • Tokenizers 0.13.1
Downloads last month
7