Edit model card

bert_12_layer_model_v2_complete_training_new_48_KD_wt_init

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 231.6032
  • Accuracy: 0.4123

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 36
  • eval_batch_size: 36
  • seed: 10
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 10000
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Accuracy
606.9724 0.06 10000 581.9434 0.1439
499.7046 0.12 20000 509.0019 0.1448
483.3571 0.18 30000 487.3823 0.1451
374.2657 0.25 40000 364.4415 0.2699
330.4914 0.31 50000 324.6827 0.3265
307.9071 0.37 60000 297.7675 0.3516
293.5813 0.43 70000 284.2744 0.3672
282.876 0.49 80000 271.3589 0.3775
274.4863 0.55 90000 261.3694 0.3869
267.0072 0.61 100000 252.8204 0.3939
259.3755 0.68 110000 247.2895 0.3992
255.5614 0.74 120000 241.0253 0.4043
250.2624 0.8 130000 238.9221 0.4085
245.3816 0.86 140000 231.6032 0.4123

Framework versions

  • Transformers 4.30.1
  • Pytorch 1.14.0a0+410ce96
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
4