Edit model card

finetune_teacher_noisy_mozilla_50_epochs_take-1

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: inf
  • Wer: 0.3771

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 256
  • total_train_batch_size: 1024
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
156.5412 10.31 1000 inf 0.4495
116.825 20.61 2000 inf 0.4151
88.1673 30.92 3000 inf 0.3941
65.9659 41.24 4000 inf 0.3771

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.12.1
  • Datasets 2.8.0
  • Tokenizers 0.13.2
Downloads last month
1