Edit model card

20230831144955

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6197
  • Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 340 0.6175 0.5
0.6359 2.0 680 0.6236 0.5
0.635 3.0 1020 0.6211 0.5
0.635 4.0 1360 0.6253 0.5
0.6306 5.0 1700 0.6421 0.5
0.6268 6.0 2040 0.6297 0.5
0.6268 7.0 2380 0.6351 0.5
0.6314 8.0 2720 0.6053 0.5
0.6135 9.0 3060 0.6185 0.5
0.6135 10.0 3400 0.6316 0.5
0.6245 11.0 3740 0.6219 0.5
0.6198 12.0 4080 0.6203 0.5
0.6198 13.0 4420 0.6516 0.5
0.6151 14.0 4760 0.6231 0.5
0.6223 15.0 5100 0.6235 0.5
0.6223 16.0 5440 0.6204 0.5
0.6216 17.0 5780 0.6225 0.5
0.6168 18.0 6120 0.6176 0.5
0.6168 19.0 6460 0.6204 0.5
0.6179 20.0 6800 0.6179 0.5
0.6169 21.0 7140 0.6193 0.5
0.6169 22.0 7480 0.6414 0.5
0.6206 23.0 7820 0.6196 0.5
0.6181 24.0 8160 0.6248 0.5
0.6269 25.0 8500 0.6173 0.5
0.6269 26.0 8840 0.6234 0.5
0.6201 27.0 9180 0.6239 0.5
0.6162 28.0 9520 0.6182 0.5
0.6162 29.0 9860 0.6260 0.5
0.6166 30.0 10200 0.6190 0.5
0.6159 31.0 10540 0.6192 0.5
0.6159 32.0 10880 0.6261 0.5
0.6158 33.0 11220 0.6295 0.5
0.6166 34.0 11560 0.6238 0.5
0.6166 35.0 11900 0.6221 0.5
0.6163 36.0 12240 0.6198 0.5
0.6177 37.0 12580 0.6177 0.5
0.6177 38.0 12920 0.6202 0.5
0.6158 39.0 13260 0.6231 0.5
0.6147 40.0 13600 0.6209 0.5
0.6147 41.0 13940 0.6191 0.5
0.6173 42.0 14280 0.6195 0.5
0.6129 43.0 14620 0.6213 0.5
0.6129 44.0 14960 0.6245 0.5
0.6173 45.0 15300 0.6235 0.5
0.6128 46.0 15640 0.6184 0.5
0.6128 47.0 15980 0.6252 0.5
0.6174 48.0 16320 0.6216 0.5
0.6157 49.0 16660 0.6248 0.5
0.6151 50.0 17000 0.6191 0.5
0.6151 51.0 17340 0.6212 0.5
0.6132 52.0 17680 0.6197 0.5
0.6173 53.0 18020 0.6233 0.5
0.6173 54.0 18360 0.6223 0.5
0.6132 55.0 18700 0.6173 0.5
0.6129 56.0 19040 0.6218 0.5
0.6129 57.0 19380 0.6178 0.5
0.614 58.0 19720 0.6239 0.5
0.616 59.0 20060 0.6258 0.5
0.616 60.0 20400 0.6181 0.5
0.6136 61.0 20740 0.6195 0.5
0.6132 62.0 21080 0.6205 0.5
0.6132 63.0 21420 0.6177 0.5
0.6121 64.0 21760 0.6221 0.5
0.6164 65.0 22100 0.6190 0.5
0.6164 66.0 22440 0.6225 0.5
0.6073 67.0 22780 0.6205 0.5
0.615 68.0 23120 0.6189 0.5
0.615 69.0 23460 0.6188 0.5
0.6136 70.0 23800 0.6200 0.5
0.6127 71.0 24140 0.6197 0.5
0.6127 72.0 24480 0.6213 0.5
0.6111 73.0 24820 0.6197 0.5
0.6133 74.0 25160 0.6215 0.5
0.6113 75.0 25500 0.6197 0.5
0.6113 76.0 25840 0.6209 0.5
0.6124 77.0 26180 0.6192 0.5
0.6112 78.0 26520 0.6200 0.5
0.6112 79.0 26860 0.6198 0.5
0.612 80.0 27200 0.6197 0.5

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
9

Dataset used to train dkqjrm/20230831144955