Edit model card

20230826035826

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2806
  • Accuracy: 0.72

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.01
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 25 0.3229 0.4
No log 2.0 50 0.3507 0.63
No log 3.0 75 0.3165 0.39
No log 4.0 100 0.3159 0.59
No log 5.0 125 0.3276 0.35
No log 6.0 150 0.3255 0.37
No log 7.0 175 0.2893 0.63
No log 8.0 200 0.3066 0.63
No log 9.0 225 0.3015 0.64
No log 10.0 250 0.2933 0.62
No log 11.0 275 0.2953 0.45
No log 12.0 300 0.2943 0.62
No log 13.0 325 0.2867 0.62
No log 14.0 350 0.2882 0.59
No log 15.0 375 0.2922 0.63
No log 16.0 400 0.2895 0.59
No log 17.0 425 0.2901 0.65
No log 18.0 450 0.2877 0.64
No log 19.0 475 0.2909 0.6
0.5537 20.0 500 0.2871 0.62
0.5537 21.0 525 0.2855 0.61
0.5537 22.0 550 0.2863 0.64
0.5537 23.0 575 0.2859 0.61
0.5537 24.0 600 0.2854 0.6
0.5537 25.0 625 0.2839 0.59
0.5537 26.0 650 0.2859 0.56
0.5537 27.0 675 0.2821 0.58
0.5537 28.0 700 0.2831 0.64
0.5537 29.0 725 0.2813 0.66
0.5537 30.0 750 0.2812 0.67
0.5537 31.0 775 0.2790 0.64
0.5537 32.0 800 0.2801 0.64
0.5537 33.0 825 0.2805 0.65
0.5537 34.0 850 0.2850 0.64
0.5537 35.0 875 0.2781 0.66
0.5537 36.0 900 0.2800 0.65
0.5537 37.0 925 0.2864 0.64
0.5537 38.0 950 0.2816 0.65
0.5537 39.0 975 0.2886 0.67
0.5047 40.0 1000 0.3101 0.67
0.5047 41.0 1025 0.2826 0.66
0.5047 42.0 1050 0.2801 0.62
0.5047 43.0 1075 0.2907 0.68
0.5047 44.0 1100 0.2894 0.64
0.5047 45.0 1125 0.2855 0.68
0.5047 46.0 1150 0.2811 0.67
0.5047 47.0 1175 0.2947 0.7
0.5047 48.0 1200 0.2952 0.69
0.5047 49.0 1225 0.2832 0.69
0.5047 50.0 1250 0.2954 0.68
0.5047 51.0 1275 0.2840 0.68
0.5047 52.0 1300 0.3079 0.67
0.5047 53.0 1325 0.2796 0.66
0.5047 54.0 1350 0.2862 0.67
0.5047 55.0 1375 0.2853 0.69
0.5047 56.0 1400 0.2969 0.69
0.5047 57.0 1425 0.2866 0.69
0.5047 58.0 1450 0.2895 0.69
0.5047 59.0 1475 0.3058 0.69
0.4502 60.0 1500 0.2998 0.68
0.4502 61.0 1525 0.2974 0.69
0.4502 62.0 1550 0.2788 0.69
0.4502 63.0 1575 0.2882 0.69
0.4502 64.0 1600 0.2893 0.7
0.4502 65.0 1625 0.2834 0.7
0.4502 66.0 1650 0.2889 0.72
0.4502 67.0 1675 0.2851 0.73
0.4502 68.0 1700 0.2773 0.7
0.4502 69.0 1725 0.2855 0.72
0.4502 70.0 1750 0.2903 0.69
0.4502 71.0 1775 0.2851 0.7
0.4502 72.0 1800 0.2892 0.69
0.4502 73.0 1825 0.2811 0.71
0.4502 74.0 1850 0.2881 0.71
0.4502 75.0 1875 0.2892 0.71
0.4502 76.0 1900 0.2835 0.71
0.4502 77.0 1925 0.2800 0.72
0.4502 78.0 1950 0.2809 0.72
0.4502 79.0 1975 0.2801 0.71
0.4329 80.0 2000 0.2806 0.72

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
8
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train dkqjrm/20230826035826