Edit model card

20230831092835

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4933
  • Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 340 0.5114 0.5
0.5104 2.0 680 0.5011 0.5
0.5162 3.0 1020 0.5183 0.5
0.5162 4.0 1360 0.4985 0.5
0.5087 5.0 1700 0.5279 0.5
0.5026 6.0 2040 0.4974 0.5
0.5026 7.0 2380 0.4970 0.5
0.5035 8.0 2720 0.5153 0.5
0.4963 9.0 3060 0.4956 0.5
0.4963 10.0 3400 0.5024 0.5
0.4986 11.0 3740 0.4932 0.5
0.4975 12.0 4080 0.4948 0.5
0.4975 13.0 4420 0.5179 0.5
0.4951 14.0 4760 0.4950 0.5
0.4987 15.0 5100 0.4946 0.5
0.4987 16.0 5440 0.4961 0.5
0.4983 17.0 5780 0.4991 0.5
0.4947 18.0 6120 0.4941 0.5
0.4947 19.0 6460 0.4925 0.5
0.4957 20.0 6800 0.4976 0.5
0.4949 21.0 7140 0.4938 0.5
0.4949 22.0 7480 0.5070 0.5
0.497 23.0 7820 0.4950 0.5
0.4958 24.0 8160 0.4959 0.5
0.4962 25.0 8500 0.4925 0.5
0.4962 26.0 8840 0.5414 0.5
0.5006 27.0 9180 0.4947 0.5
0.4998 28.0 9520 0.4976 0.5
0.4998 29.0 9860 0.5053 0.5
0.4973 30.0 10200 0.4925 0.5
0.4972 31.0 10540 0.4929 0.5
0.4972 32.0 10880 0.5097 0.5
0.4974 33.0 11220 0.4925 0.5
0.4968 34.0 11560 0.4985 0.5
0.4968 35.0 11900 0.4975 0.5
0.4975 36.0 12240 0.4971 0.5
0.4966 37.0 12580 0.4925 0.5
0.4966 38.0 12920 0.4933 0.5
0.4961 39.0 13260 0.5030 0.5
0.4944 40.0 13600 0.4939 0.5
0.4944 41.0 13940 0.4926 0.5
0.4957 42.0 14280 0.4955 0.5
0.4933 43.0 14620 0.4937 0.5
0.4933 44.0 14960 0.4942 0.5
0.496 45.0 15300 0.5004 0.5
0.493 46.0 15640 0.4936 0.5
0.493 47.0 15980 0.4977 0.5
0.4953 48.0 16320 0.4927 0.5
0.4948 49.0 16660 0.4993 0.5
0.4939 50.0 17000 0.4928 0.5
0.4939 51.0 17340 0.4925 0.5
0.4927 52.0 17680 0.4934 0.5
0.4962 53.0 18020 0.4943 0.5
0.4962 54.0 18360 0.4928 0.5
0.493 55.0 18700 0.4926 0.5
0.4925 56.0 19040 0.4929 0.5
0.4925 57.0 19380 0.4926 0.5
0.493 58.0 19720 0.4931 0.5
0.4938 59.0 20060 0.5001 0.5
0.4938 60.0 20400 0.4925 0.5
0.4923 61.0 20740 0.4928 0.5
0.4924 62.0 21080 0.4927 0.5
0.4924 63.0 21420 0.4931 0.5
0.492 64.0 21760 0.4944 0.5
0.4945 65.0 22100 0.4928 0.5
0.4945 66.0 22440 0.4954 0.5
0.4892 67.0 22780 0.4925 0.5
0.4932 68.0 23120 0.4934 0.5
0.4932 69.0 23460 0.4932 0.5
0.4919 70.0 23800 0.4925 0.5
0.4916 71.0 24140 0.4930 0.5
0.4916 72.0 24480 0.4952 0.5
0.4904 73.0 24820 0.4936 0.5
0.4924 74.0 25160 0.4951 0.5
0.4913 75.0 25500 0.4934 0.5
0.4913 76.0 25840 0.4937 0.5
0.4921 77.0 26180 0.4927 0.5
0.4913 78.0 26520 0.4933 0.5
0.4913 79.0 26860 0.4933 0.5
0.4917 80.0 27200 0.4933 0.5

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
10

Dataset used to train dkqjrm/20230831092835