Edit model card

20230826022810

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2767
  • Accuracy: 0.73

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.01
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 11
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 80.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 25 0.4473 0.5
No log 2.0 50 0.3750 0.6
No log 3.0 75 0.3427 0.63
No log 4.0 100 0.2967 0.63
No log 5.0 125 0.2981 0.57
No log 6.0 150 0.3264 0.56
No log 7.0 175 0.2918 0.58
No log 8.0 200 0.3062 0.66
No log 9.0 225 0.2885 0.58
No log 10.0 250 0.2884 0.6
No log 11.0 275 0.2963 0.55
No log 12.0 300 0.2895 0.6
No log 13.0 325 0.2873 0.6
No log 14.0 350 0.2884 0.58
No log 15.0 375 0.2871 0.59
No log 16.0 400 0.2859 0.6
No log 17.0 425 0.2912 0.53
No log 18.0 450 0.2841 0.61
No log 19.0 475 0.2834 0.61
0.5493 20.0 500 0.2825 0.64
0.5493 21.0 525 0.2847 0.62
0.5493 22.0 550 0.2782 0.62
0.5493 23.0 575 0.2759 0.62
0.5493 24.0 600 0.2750 0.67
0.5493 25.0 625 0.2745 0.69
0.5493 26.0 650 0.2721 0.66
0.5493 27.0 675 0.2728 0.65
0.5493 28.0 700 0.2848 0.69
0.5493 29.0 725 0.2727 0.65
0.5493 30.0 750 0.2739 0.66
0.5493 31.0 775 0.2715 0.66
0.5493 32.0 800 0.2950 0.67
0.5493 33.0 825 0.2764 0.68
0.5493 34.0 850 0.2693 0.68
0.5493 35.0 875 0.2686 0.69
0.5493 36.0 900 0.2793 0.66
0.5493 37.0 925 0.2700 0.68
0.5493 38.0 950 0.2744 0.68
0.5493 39.0 975 0.2789 0.71
0.4987 40.0 1000 0.2757 0.7
0.4987 41.0 1025 0.2705 0.69
0.4987 42.0 1050 0.2836 0.7
0.4987 43.0 1075 0.2808 0.6
0.4987 44.0 1100 0.2734 0.71
0.4987 45.0 1125 0.2703 0.69
0.4987 46.0 1150 0.2787 0.72
0.4987 47.0 1175 0.2684 0.69
0.4987 48.0 1200 0.2737 0.7
0.4987 49.0 1225 0.2792 0.72
0.4987 50.0 1250 0.2737 0.71
0.4987 51.0 1275 0.2723 0.71
0.4987 52.0 1300 0.2725 0.73
0.4987 53.0 1325 0.2722 0.71
0.4987 54.0 1350 0.2800 0.7
0.4987 55.0 1375 0.2769 0.71
0.4987 56.0 1400 0.2772 0.76
0.4987 57.0 1425 0.2715 0.77
0.4987 58.0 1450 0.2794 0.75
0.4987 59.0 1475 0.2771 0.73
0.447 60.0 1500 0.2798 0.7
0.447 61.0 1525 0.2717 0.74
0.447 62.0 1550 0.2991 0.71
0.447 63.0 1575 0.2719 0.72
0.447 64.0 1600 0.2762 0.72
0.447 65.0 1625 0.2833 0.73
0.447 66.0 1650 0.2772 0.74
0.447 67.0 1675 0.2807 0.71
0.447 68.0 1700 0.2741 0.73
0.447 69.0 1725 0.2765 0.72
0.447 70.0 1750 0.2786 0.73
0.447 71.0 1775 0.2795 0.73
0.447 72.0 1800 0.2752 0.74
0.447 73.0 1825 0.2838 0.71
0.447 74.0 1850 0.2763 0.74
0.447 75.0 1875 0.2764 0.73
0.447 76.0 1900 0.2756 0.72
0.447 77.0 1925 0.2738 0.74
0.447 78.0 1950 0.2743 0.74
0.447 79.0 1975 0.2779 0.72
0.4199 80.0 2000 0.2767 0.73

Framework versions

  • Transformers 4.26.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
8
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train dkqjrm/20230826022810