Edit model card

20230823054903

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0697
  • Accuracy: 0.5271

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.004
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 60.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 312 0.0788 0.5271
0.1715 2.0 624 0.2059 0.4729
0.1715 3.0 936 0.1134 0.4729
0.1222 4.0 1248 0.0831 0.5343
0.1187 5.0 1560 0.0732 0.5704
0.1187 6.0 1872 0.0796 0.4729
0.0991 7.0 2184 0.0745 0.4729
0.0991 8.0 2496 0.0716 0.4982
0.0819 9.0 2808 0.0705 0.4910
0.0807 10.0 3120 0.0702 0.4765
0.0807 11.0 3432 0.0713 0.4729
0.0802 12.0 3744 0.0703 0.4729
0.0795 13.0 4056 0.0784 0.4729
0.0795 14.0 4368 0.0706 0.5307
0.0794 15.0 4680 0.0730 0.4729
0.0794 16.0 4992 0.0706 0.4801
0.0806 17.0 5304 0.0711 0.5596
0.0811 18.0 5616 0.0704 0.4693
0.0811 19.0 5928 0.0701 0.4874
0.0798 20.0 6240 0.0719 0.6101
0.0793 21.0 6552 0.0705 0.4693
0.0793 22.0 6864 0.0707 0.5884
0.0795 23.0 7176 0.0712 0.4729
0.0795 24.0 7488 0.0705 0.4729
0.0796 25.0 7800 0.0789 0.5271
0.0796 26.0 8112 0.0705 0.4801
0.0796 27.0 8424 0.0703 0.4765
0.0787 28.0 8736 0.0703 0.4838
0.079 29.0 9048 0.0716 0.4729
0.079 30.0 9360 0.0739 0.5704
0.0788 31.0 9672 0.0749 0.5632
0.0788 32.0 9984 0.0711 0.4729
0.0789 33.0 10296 0.0705 0.4838
0.0786 34.0 10608 0.0700 0.5199
0.0786 35.0 10920 0.0699 0.4838
0.0785 36.0 11232 0.0715 0.4729
0.0784 37.0 11544 0.0716 0.6354
0.0784 38.0 11856 0.0719 0.4729
0.0781 39.0 12168 0.0700 0.5487
0.0781 40.0 12480 0.0700 0.5848
0.0778 41.0 12792 0.0704 0.6173
0.0778 42.0 13104 0.0705 0.5848
0.0778 43.0 13416 0.0705 0.6209
0.078 44.0 13728 0.0701 0.5199
0.0776 45.0 14040 0.0704 0.5957
0.0776 46.0 14352 0.0702 0.5848
0.0772 47.0 14664 0.0703 0.4765
0.0772 48.0 14976 0.0697 0.5379
0.0773 49.0 15288 0.0696 0.5596
0.0772 50.0 15600 0.0702 0.4765
0.0772 51.0 15912 0.0701 0.4801
0.0776 52.0 16224 0.0706 0.4729
0.0772 53.0 16536 0.0698 0.5054
0.0772 54.0 16848 0.0706 0.6318
0.0766 55.0 17160 0.0708 0.4765
0.0766 56.0 17472 0.0700 0.6209
0.0766 57.0 17784 0.0697 0.5307
0.0767 58.0 18096 0.0700 0.4801
0.0767 59.0 18408 0.0697 0.5235
0.0767 60.0 18720 0.0697 0.5271

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
5

Dataset used to train dkqjrm/20230823054903