Edit model card

20230824104100

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0729
  • Accuracy: 0.7473

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.003
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 60.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 312 0.2294 0.5307
0.3686 2.0 624 0.5346 0.4729
0.3686 3.0 936 0.2223 0.5235
0.2907 4.0 1248 0.1895 0.4729
0.2686 5.0 1560 0.1783 0.5018
0.2686 6.0 1872 0.1995 0.5884
0.2686 7.0 2184 0.3037 0.5740
0.2686 8.0 2496 0.1386 0.6715
0.266 9.0 2808 0.1311 0.7076
0.2363 10.0 3120 0.1403 0.6968
0.2363 11.0 3432 0.2988 0.5957
0.215 12.0 3744 0.1119 0.6968
0.198 13.0 4056 0.1238 0.6859
0.198 14.0 4368 0.1107 0.7040
0.1845 15.0 4680 0.1604 0.6570
0.1845 16.0 4992 0.1143 0.7004
0.1664 17.0 5304 0.1197 0.7148
0.159 18.0 5616 0.1122 0.7329
0.159 19.0 5928 0.1038 0.7184
0.145 20.0 6240 0.0973 0.7040
0.1304 21.0 6552 0.0996 0.7292
0.1304 22.0 6864 0.0938 0.7473
0.1264 23.0 7176 0.1212 0.7437
0.1264 24.0 7488 0.0953 0.7256
0.1212 25.0 7800 0.0899 0.7329
0.1172 26.0 8112 0.1037 0.7365
0.1172 27.0 8424 0.0844 0.7292
0.1122 28.0 8736 0.0850 0.7365
0.1131 29.0 9048 0.0875 0.7220
0.1131 30.0 9360 0.0904 0.7437
0.1082 31.0 9672 0.0883 0.7184
0.1082 32.0 9984 0.0800 0.7509
0.1086 33.0 10296 0.0897 0.7509
0.1015 34.0 10608 0.0837 0.7473
0.1015 35.0 10920 0.0820 0.7329
0.099 36.0 11232 0.0819 0.7365
0.0942 37.0 11544 0.0858 0.7509
0.0942 38.0 11856 0.0793 0.7437
0.0956 39.0 12168 0.0823 0.7581
0.0956 40.0 12480 0.0860 0.7256
0.0921 41.0 12792 0.0753 0.7545
0.0911 42.0 13104 0.0838 0.7473
0.0911 43.0 13416 0.0763 0.7545
0.0894 44.0 13728 0.0761 0.7473
0.0886 45.0 14040 0.0752 0.7581
0.0886 46.0 14352 0.0743 0.7437
0.0855 47.0 14664 0.0759 0.7581
0.0855 48.0 14976 0.0801 0.7437
0.0837 49.0 15288 0.0797 0.7473
0.083 50.0 15600 0.0734 0.7509
0.083 51.0 15912 0.0756 0.7545
0.0845 52.0 16224 0.0744 0.7401
0.084 53.0 16536 0.0731 0.7545
0.084 54.0 16848 0.0736 0.7473
0.0797 55.0 17160 0.0734 0.7653
0.0797 56.0 17472 0.0735 0.7545
0.0803 57.0 17784 0.0737 0.7545
0.0792 58.0 18096 0.0735 0.7581
0.0792 59.0 18408 0.0732 0.7581
0.0815 60.0 18720 0.0729 0.7473

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
7
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train dkqjrm/20230824104100