Edit model card

libri-alpha-0.75-Temp-1-attention-3-layers-distil-with-6-layers-att-take-4

This model is a fine-tuned version of rohitp1/libri-alpha-0.75-Temp-1-attention-3-layers-distil-with-6-layers-att-take-2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 37.5364
  • Wer: 0.3334

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.002
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 40
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
43.7806 0.9 400 41.3073 0.2570
48.6549 1.8 800 41.8945 0.2740
57.4209 2.7 1200 39.9947 0.2872
68.8449 3.59 1600 39.4528 0.3059
79.4299 4.49 2000 38.9575 0.3179
93.0514 5.39 2400 37.5364 0.3334

Framework versions

  • Transformers 4.24.0
  • Pytorch 1.12.1
  • Datasets 2.7.1
  • Tokenizers 0.11.0
Downloads last month
8