Edit model card

bert_12_layer_model_v2_complete_training_new_48_KD

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 418.2312
  • Accuracy: 0.1802

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 36
  • eval_batch_size: 36
  • seed: 10
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 10000
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Accuracy
846.7844 0.06 10000 799.2012 0.1433
603.1405 0.12 20000 597.2043 0.1455
552.8343 0.18 30000 549.4058 0.1455
525.8206 0.25 40000 523.2474 0.1455
508.5397 0.31 50000 508.2666 0.1467
495.479 0.37 60000 494.1740 0.1454
485.269 0.43 70000 483.4185 0.1459
474.9876 0.49 80000 475.5062 0.1475
464.3079 0.55 90000 460.0214 0.1507
455.1477 0.61 100000 451.2754 0.1553
444.9362 0.68 110000 441.2908 0.1596
438.575 0.74 120000 432.5171 0.1660
429.8774 0.8 130000 425.1851 0.1693
421.0561 0.86 140000 418.2312 0.1802

Framework versions

  • Transformers 4.30.1
  • Pytorch 1.14.0a0+410ce96
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
1