Edit model card

20230826022800

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4898
  • Accuracy: 0.75

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.01
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 25 0.5778 0.38
No log 2.0 50 0.5810 0.66
No log 3.0 75 0.6271 0.65
No log 4.0 100 0.5772 0.64
No log 5.0 125 0.5290 0.62
No log 6.0 150 0.5352 0.62
No log 7.0 175 0.5322 0.61
No log 8.0 200 0.5976 0.64
No log 9.0 225 0.5290 0.61
No log 10.0 250 0.5700 0.66
No log 11.0 275 0.5132 0.66
No log 12.0 300 0.5155 0.64
No log 13.0 325 0.5049 0.67
No log 14.0 350 0.5078 0.67
No log 15.0 375 0.4821 0.68
No log 16.0 400 0.5371 0.7
No log 17.0 425 0.5407 0.69
No log 18.0 450 0.6441 0.71
No log 19.0 475 0.5787 0.7
0.6402 20.0 500 0.5646 0.68
0.6402 21.0 525 0.5553 0.71
0.6402 22.0 550 0.6137 0.72
0.6402 23.0 575 0.4948 0.71
0.6402 24.0 600 0.5510 0.72
0.6402 25.0 625 0.5985 0.7
0.6402 26.0 650 0.5660 0.71
0.6402 27.0 675 0.5232 0.71
0.6402 28.0 700 0.5381 0.71
0.6402 29.0 725 0.5234 0.71
0.6402 30.0 750 0.6145 0.71
0.6402 31.0 775 0.5482 0.73
0.6402 32.0 800 0.5246 0.72
0.6402 33.0 825 0.5258 0.71
0.6402 34.0 850 0.5278 0.72
0.6402 35.0 875 0.5245 0.72
0.6402 36.0 900 0.5073 0.72
0.6402 37.0 925 0.4983 0.72
0.6402 38.0 950 0.5077 0.73
0.6402 39.0 975 0.5263 0.73
0.3719 40.0 1000 0.5096 0.73
0.3719 41.0 1025 0.5339 0.73
0.3719 42.0 1050 0.4964 0.75
0.3719 43.0 1075 0.4832 0.73
0.3719 44.0 1100 0.4940 0.73
0.3719 45.0 1125 0.4982 0.72
0.3719 46.0 1150 0.5449 0.73
0.3719 47.0 1175 0.5175 0.73
0.3719 48.0 1200 0.5208 0.74
0.3719 49.0 1225 0.5281 0.74
0.3719 50.0 1250 0.4940 0.76
0.3719 51.0 1275 0.5020 0.74
0.3719 52.0 1300 0.5010 0.74
0.3719 53.0 1325 0.4799 0.73
0.3719 54.0 1350 0.5206 0.74
0.3719 55.0 1375 0.5148 0.75
0.3719 56.0 1400 0.4815 0.74
0.3719 57.0 1425 0.4951 0.74
0.3719 58.0 1450 0.5077 0.74
0.3719 59.0 1475 0.5000 0.74
0.3121 60.0 1500 0.5124 0.75
0.3121 61.0 1525 0.4891 0.76
0.3121 62.0 1550 0.4994 0.75
0.3121 63.0 1575 0.4947 0.75
0.3121 64.0 1600 0.4833 0.74
0.3121 65.0 1625 0.5135 0.75
0.3121 66.0 1650 0.4803 0.75
0.3121 67.0 1675 0.5058 0.75
0.3121 68.0 1700 0.4840 0.75
0.3121 69.0 1725 0.5051 0.75
0.3121 70.0 1750 0.4883 0.74
0.3121 71.0 1775 0.4972 0.74
0.3121 72.0 1800 0.4789 0.74
0.3121 73.0 1825 0.4984 0.74
0.3121 74.0 1850 0.4913 0.74
0.3121 75.0 1875 0.4879 0.74
0.3121 76.0 1900 0.4902 0.74
0.3121 77.0 1925 0.4856 0.74
0.3121 78.0 1950 0.4893 0.74
0.3121 79.0 1975 0.4907 0.75
0.2906 80.0 2000 0.4898 0.75

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
8
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train dkqjrm/20230826022800