Edit model card

20230823074620

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0693
  • Accuracy: 0.5812

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.005
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 60.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 312 0.1529 0.4729
0.2201 2.0 624 0.1022 0.5271
0.2201 3.0 936 0.2619 0.4729
0.1563 4.0 1248 0.0738 0.5199
0.0889 5.0 1560 0.0709 0.4982
0.0889 6.0 1872 0.0758 0.4729
0.0808 7.0 2184 0.0732 0.4729
0.0808 8.0 2496 0.0716 0.5596
0.0802 9.0 2808 0.0707 0.5307
0.0819 10.0 3120 0.0712 0.4729
0.0819 11.0 3432 0.0706 0.4765
0.0818 12.0 3744 0.0703 0.4621
0.08 13.0 4056 0.0737 0.4729
0.08 14.0 4368 0.0712 0.5307
0.0803 15.0 4680 0.0738 0.4729
0.0803 16.0 4992 0.0708 0.4729
0.0807 17.0 5304 0.0709 0.5487
0.082 18.0 5616 0.0720 0.5523
0.082 19.0 5928 0.0712 0.4729
0.0806 20.0 6240 0.0703 0.5090
0.0801 21.0 6552 0.0710 0.4729
0.0801 22.0 6864 0.0701 0.4874
0.0798 23.0 7176 0.0703 0.4874
0.0798 24.0 7488 0.0705 0.4765
0.0854 25.0 7800 0.0704 0.5523
0.0793 26.0 8112 0.0702 0.4910
0.0793 27.0 8424 0.0721 0.4729
0.0792 28.0 8736 0.0720 0.4729
0.0794 29.0 9048 0.0713 0.4765
0.0794 30.0 9360 0.0701 0.5632
0.0785 31.0 9672 0.0710 0.6101
0.0785 32.0 9984 0.0703 0.4801
0.0786 33.0 10296 0.0728 0.4729
0.0791 34.0 10608 0.0703 0.5054
0.0791 35.0 10920 0.0716 0.6173
0.0789 36.0 11232 0.0708 0.4765
0.0786 37.0 11544 0.0770 0.4729
0.0786 38.0 11856 0.0718 0.4729
0.0784 39.0 12168 0.0700 0.4838
0.0784 40.0 12480 0.0699 0.5235
0.0775 41.0 12792 0.0698 0.6137
0.0779 42.0 13104 0.0697 0.5199
0.0779 43.0 13416 0.0698 0.6534
0.0777 44.0 13728 0.0697 0.5848
0.0776 45.0 14040 0.0699 0.6426
0.0776 46.0 14352 0.0697 0.6029
0.0769 47.0 14664 0.0705 0.4874
0.0769 48.0 14976 0.0695 0.6209
0.077 49.0 15288 0.0695 0.5668
0.077 50.0 15600 0.0696 0.5018
0.077 51.0 15912 0.0700 0.4946
0.0774 52.0 16224 0.0701 0.4982
0.0767 53.0 16536 0.0694 0.5812
0.0767 54.0 16848 0.0701 0.6462
0.0761 55.0 17160 0.0706 0.4874
0.0761 56.0 17472 0.0695 0.6787
0.0762 57.0 17784 0.0693 0.6029
0.0763 58.0 18096 0.0696 0.5199
0.0763 59.0 18408 0.0693 0.5740
0.0763 60.0 18720 0.0693 0.5812

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
11

Dataset used to train dkqjrm/20230823074620