Edit model card

20230826052103

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5758
  • Accuracy: 0.73

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.02
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 25 0.6259 0.48
No log 2.0 50 0.7321 0.62
No log 3.0 75 0.7953 0.64
No log 4.0 100 0.6993 0.65
No log 5.0 125 0.5882 0.62
No log 6.0 150 0.5896 0.63
No log 7.0 175 0.6143 0.66
No log 8.0 200 0.7070 0.63
No log 9.0 225 0.6441 0.67
No log 10.0 250 0.7048 0.68
No log 11.0 275 0.5610 0.7
No log 12.0 300 0.6845 0.69
No log 13.0 325 0.7743 0.67
No log 14.0 350 0.7745 0.68
No log 15.0 375 0.7992 0.72
No log 16.0 400 0.7166 0.72
No log 17.0 425 0.7013 0.75
No log 18.0 450 0.8815 0.72
No log 19.0 475 0.7997 0.72
0.6923 20.0 500 0.7411 0.7
0.6923 21.0 525 0.7322 0.71
0.6923 22.0 550 0.8924 0.67
0.6923 23.0 575 0.7238 0.7
0.6923 24.0 600 0.7785 0.71
0.6923 25.0 625 0.6886 0.71
0.6923 26.0 650 0.7782 0.72
0.6923 27.0 675 0.7322 0.71
0.6923 28.0 700 0.7590 0.68
0.6923 29.0 725 0.7170 0.71
0.6923 30.0 750 0.7993 0.71
0.6923 31.0 775 0.7465 0.7
0.6923 32.0 800 0.6627 0.7
0.6923 33.0 825 0.7128 0.7
0.6923 34.0 850 0.6699 0.69
0.6923 35.0 875 0.6974 0.69
0.6923 36.0 900 0.6626 0.7
0.6923 37.0 925 0.6843 0.7
0.6923 38.0 950 0.6846 0.71
0.6923 39.0 975 0.7098 0.71
0.2907 40.0 1000 0.6845 0.71
0.2907 41.0 1025 0.6782 0.71
0.2907 42.0 1050 0.6635 0.7
0.2907 43.0 1075 0.5903 0.7
0.2907 44.0 1100 0.6072 0.71
0.2907 45.0 1125 0.5961 0.72
0.2907 46.0 1150 0.6115 0.72
0.2907 47.0 1175 0.6240 0.71
0.2907 48.0 1200 0.6327 0.72
0.2907 49.0 1225 0.6935 0.71
0.2907 50.0 1250 0.5864 0.73
0.2907 51.0 1275 0.5779 0.72
0.2907 52.0 1300 0.6013 0.73
0.2907 53.0 1325 0.5665 0.75
0.2907 54.0 1350 0.5745 0.76
0.2907 55.0 1375 0.6108 0.75
0.2907 56.0 1400 0.5844 0.75
0.2907 57.0 1425 0.5647 0.77
0.2907 58.0 1450 0.5844 0.76
0.2907 59.0 1475 0.5720 0.75
0.2156 60.0 1500 0.5815 0.72
0.2156 61.0 1525 0.5615 0.73
0.2156 62.0 1550 0.5820 0.75
0.2156 63.0 1575 0.5712 0.73
0.2156 64.0 1600 0.5682 0.72
0.2156 65.0 1625 0.6267 0.73
0.2156 66.0 1650 0.5815 0.74
0.2156 67.0 1675 0.6171 0.73
0.2156 68.0 1700 0.5554 0.74
0.2156 69.0 1725 0.6060 0.72
0.2156 70.0 1750 0.5575 0.73
0.2156 71.0 1775 0.5885 0.73
0.2156 72.0 1800 0.5571 0.73
0.2156 73.0 1825 0.5845 0.73
0.2156 74.0 1850 0.5710 0.73
0.2156 75.0 1875 0.5680 0.73
0.2156 76.0 1900 0.5799 0.73
0.2156 77.0 1925 0.5636 0.73
0.2156 78.0 1950 0.5738 0.73
0.2156 79.0 1975 0.5750 0.73
0.194 80.0 2000 0.5758 0.73

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
8
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train dkqjrm/20230826052103