Edit model card

20230831143012

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6198
  • Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 340 0.6258 0.5
0.6312 2.0 680 0.6164 0.5
0.6295 3.0 1020 0.6237 0.5
0.6295 4.0 1360 0.6170 0.5
0.6241 5.0 1700 0.6181 0.5
0.6236 6.0 2040 0.6191 0.5
0.6236 7.0 2380 0.6189 0.5
0.6239 8.0 2720 0.6261 0.5
0.6189 9.0 3060 0.6188 0.5
0.6189 10.0 3400 0.6264 0.5
0.623 11.0 3740 0.6200 0.5
0.6207 12.0 4080 0.6273 0.5
0.6207 13.0 4420 0.6450 0.5
0.6183 14.0 4760 0.6217 0.5
0.6235 15.0 5100 0.6226 0.5
0.6235 16.0 5440 0.6237 0.5
0.623 17.0 5780 0.6185 0.5
0.6176 18.0 6120 0.6202 0.5
0.6176 19.0 6460 0.6180 0.5
0.6204 20.0 6800 0.6195 0.5
0.6186 21.0 7140 0.6174 0.5
0.6186 22.0 7480 0.6283 0.5
0.621 23.0 7820 0.6254 0.5
0.6196 24.0 8160 0.6169 0.5
0.6218 25.0 8500 0.6170 0.5
0.6218 26.0 8840 0.6256 0.5
0.621 27.0 9180 0.6479 0.5
0.6189 28.0 9520 0.6170 0.5
0.6189 29.0 9860 0.6219 0.5
0.619 30.0 10200 0.6169 0.5
0.6175 31.0 10540 0.6169 0.5
0.6175 32.0 10880 0.6379 0.5
0.6181 33.0 11220 0.6193 0.5
0.6185 34.0 11560 0.6219 0.5
0.6185 35.0 11900 0.6188 0.5
0.6186 36.0 12240 0.6196 0.5
0.6185 37.0 12580 0.6170 0.5
0.6185 38.0 12920 0.6238 0.5
0.6167 39.0 13260 0.6332 0.5
0.6164 40.0 13600 0.6207 0.5
0.6164 41.0 13940 0.6176 0.5
0.6174 42.0 14280 0.6190 0.5
0.6137 43.0 14620 0.6190 0.5
0.6137 44.0 14960 0.6175 0.5
0.6179 45.0 15300 0.6263 0.5
0.6141 46.0 15640 0.6183 0.5
0.6141 47.0 15980 0.6275 0.5
0.6176 48.0 16320 0.6174 0.5
0.616 49.0 16660 0.6224 0.5
0.6162 50.0 17000 0.6173 0.5
0.6162 51.0 17340 0.6191 0.5
0.6135 52.0 17680 0.6187 0.5
0.6186 53.0 18020 0.6232 0.5
0.6186 54.0 18360 0.6191 0.5
0.6135 55.0 18700 0.6184 0.5
0.6138 56.0 19040 0.6186 0.5
0.6138 57.0 19380 0.6176 0.5
0.6137 58.0 19720 0.6236 0.5
0.6153 59.0 20060 0.6251 0.5
0.6153 60.0 20400 0.6166 0.5
0.6132 61.0 20740 0.6175 0.5
0.6131 62.0 21080 0.6199 0.5
0.6131 63.0 21420 0.6178 0.5
0.6121 64.0 21760 0.6212 0.5
0.6169 65.0 22100 0.6183 0.5
0.6169 66.0 22440 0.6252 0.5
0.6079 67.0 22780 0.6191 0.5
0.6151 68.0 23120 0.6170 0.5
0.6151 69.0 23460 0.6182 0.5
0.6128 70.0 23800 0.6191 0.5
0.6118 71.0 24140 0.6194 0.5
0.6118 72.0 24480 0.6224 0.5
0.6112 73.0 24820 0.6199 0.5
0.6129 74.0 25160 0.6210 0.5
0.6109 75.0 25500 0.6193 0.5
0.6109 76.0 25840 0.6210 0.5
0.612 77.0 26180 0.6187 0.5
0.6109 78.0 26520 0.6203 0.5
0.6109 79.0 26860 0.6197 0.5
0.6115 80.0 27200 0.6198 0.5

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
3
Inference API
This model can be loaded on Inference API (serverless).

Dataset used to train dkqjrm/20230831143012