Edit model card

libri-alpha-0.75-Temp-1-attention-3-layers-distil-with-6-layers-att

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 43.3741
  • Wer: 0.4535

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.002
  • train_batch_size: 8
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 40
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
1062.9452 0.45 400 45.3735 0.5247
938.9115 0.9 800 42.2431 0.4830
838.9108 1.35 1200 40.3582 0.4517
815.9835 1.79 1600 39.1145 0.4403
815.1952 2.24 2000 40.4637 0.4417
783.388 2.69 2400 39.3749 0.4312
786.6658 3.14 2800 41.7742 0.4450
785.0494 3.59 3200 42.3615 0.4562
808.8199 4.04 3600 43.4402 0.4527
765.5683 4.48 4000 43.2136 0.4505
803.3544 4.93 4400 43.3741 0.4535

Framework versions

  • Transformers 4.24.0
  • Pytorch 1.12.1
  • Datasets 2.7.1
  • Tokenizers 0.11.0
Downloads last month
8