distilbert_sa_pre-training-complete

This model is a fine-tuned version of distilbert-base-uncased on the wikitext wikitext-103-raw-v1 dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3367
  • Accuracy: 0.7119

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 10
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 300000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.8819 1.0 1787 1.6350 0.6570
1.7699 2.0 3574 1.5847 0.6664
1.7308 3.0 5361 1.5857 0.6678
1.7062 4.0 7148 1.5679 0.6684
1.6858 5.0 8935 1.5573 0.6689
1.6684 6.0 10722 1.5591 0.6687
1.6545 7.0 12509 1.5253 0.6756
1.6438 8.0 14296 1.5304 0.6748
1.631 9.0 16083 1.4976 0.6805
1.6236 10.0 17870 1.5153 0.6773
1.613 11.0 19657 1.5010 0.6786
1.6046 12.0 21444 1.5069 0.6818
1.5963 13.0 23231 1.4796 0.6805
1.5906 14.0 25018 1.4858 0.6809
1.5833 15.0 26805 1.4962 0.6806
1.5771 16.0 28592 1.4749 0.6826
1.5703 17.0 30379 1.4724 0.6874
1.5663 18.0 32166 1.4946 0.6827
1.5614 19.0 33953 1.4934 0.6798
1.5558 20.0 35740 1.4512 0.6893
1.5508 21.0 37527 1.4417 0.6921
1.5466 22.0 39314 1.4744 0.6837
1.5427 23.0 41101 1.4530 0.6904
1.5382 24.0 42888 1.4378 0.6900
1.5342 25.0 44675 1.4732 0.6820
1.5293 26.0 46462 1.4609 0.6865
1.5259 27.0 48249 1.4400 0.6942
1.5223 28.0 50036 1.4586 0.6879
1.5197 29.0 51823 1.4371 0.6906
1.5161 30.0 53610 1.4335 0.6908
1.5126 31.0 55397 1.4159 0.6978
1.5087 32.0 57184 1.4412 0.6858
1.5061 33.0 58971 1.4181 0.6958
1.5035 34.0 60758 1.4357 0.6903
1.5011 35.0 62545 1.4147 0.6914
1.4967 36.0 64332 1.4251 0.6931
1.4954 37.0 66119 1.4366 0.6910
1.4925 38.0 67906 1.4488 0.6896
1.4889 39.0 69693 1.4336 0.6907
1.4866 40.0 71480 1.4226 0.6945
1.4838 41.0 73267 1.4210 0.6951
1.4821 42.0 75054 1.4081 0.6944
1.4792 43.0 76841 1.4496 0.6895
1.4778 44.0 78628 1.4263 0.6925
1.4754 45.0 80415 1.3995 0.6982
1.4736 46.0 82202 1.3958 0.6992
1.4702 47.0 83989 1.4073 0.6989
1.4683 48.0 85776 1.3917 0.6996
1.4663 49.0 87563 1.4109 0.6956
1.4648 50.0 89350 1.4003 0.6991
1.4619 51.0 91137 1.3847 0.7040
1.4599 52.0 92924 1.4323 0.6936
1.4596 53.0 94711 1.3935 0.6986
1.4567 54.0 96498 1.4122 0.6974
1.455 55.0 98285 1.3975 0.6977
1.4533 56.0 100072 1.3907 0.7004
1.4513 57.0 101859 1.3957 0.6997
1.45 58.0 103646 1.3907 0.7001
1.4466 59.0 105433 1.4063 0.6988
1.4455 60.0 107220 1.3600 0.7036
1.4439 61.0 109007 1.3941 0.7015
1.4432 62.0 110794 1.3854 0.7016
1.4404 63.0 112581 1.4080 0.6973
1.4397 64.0 114368 1.3924 0.7011
1.4366 65.0 116155 1.3872 0.7049
1.4358 66.0 117942 1.3924 0.7010
1.4339 67.0 119729 1.3895 0.6989
1.4329 68.0 121516 1.4007 0.7002
1.4319 69.0 123303 1.3672 0.7047
1.4287 70.0 125090 1.3892 0.7021
1.4284 71.0 126877 1.3674 0.7016
1.4265 72.0 128664 1.3983 0.6962
1.4254 73.0 130451 1.3665 0.7006
1.4231 74.0 132238 1.3852 0.7003
1.4228 75.0 134025 1.4328 0.6945
1.4217 76.0 135812 1.3831 0.7031
1.4198 77.0 137599 1.3897 0.7017
1.4183 78.0 139386 1.3790 0.7027
1.417 79.0 141173 1.3697 0.7063
1.4179 80.0 142960 1.3805 0.7008
1.414 81.0 144747 1.3579 0.7051
1.4132 82.0 146534 1.3727 0.7054
1.4112 83.0 148321 1.3763 0.7014
1.412 84.0 150108 1.3806 0.7051
1.4103 85.0 151895 1.3889 0.7022
1.4082 86.0 153682 1.3886 0.6993
1.4068 87.0 155469 1.3721 0.7025
1.4068 88.0 157256 1.3589 0.7050
1.4043 89.0 159043 1.3599 0.7043
1.4042 90.0 160830 1.3675 0.7055
1.4033 91.0 162617 1.3720 0.7031
1.402 92.0 164404 1.3506 0.7067
1.4004 93.0 166191 1.3833 0.7017
1.4001 94.0 167978 1.3734 0.7021
1.398 95.0 169765 1.3792 0.7037
1.3983 96.0 171552 1.3676 0.7050
1.3966 97.0 173339 1.3888 0.7027
1.3953 98.0 175126 1.3928 0.7004
1.3934 99.0 176913 1.3824 0.7011
1.3928 100.0 178700 1.3829 0.7040
1.3919 101.0 180487 1.3534 0.7046
1.3902 102.0 182274 1.3559 0.7048
1.3896 103.0 184061 1.3682 0.7045
1.3895 104.0 185848 1.3643 0.7053
1.3884 105.0 187635 1.3854 0.7018
1.3878 106.0 189422 1.3702 0.7038
1.3865 107.0 191209 1.3530 0.7056
1.3842 108.0 192996 1.3804 0.7008
1.3836 109.0 194783 1.3508 0.7073
1.3831 110.0 196570 1.3821 0.7002
1.3828 111.0 198357 1.3552 0.7063
1.382 112.0 200144 1.3567 0.7090
1.3806 113.0 201931 1.3915 0.7022
1.3802 114.0 203718 1.3661 0.7098
1.3789 115.0 205505 1.3586 0.7071
1.3789 116.0 207292 1.3643 0.7049
1.3782 117.0 209079 1.3716 0.7037
1.3763 118.0 210866 1.3476 0.7077
1.3762 119.0 212653 1.3519 0.7067
1.3741 120.0 214440 1.3736 0.7027
1.3741 121.0 216227 1.3395 0.7073
1.3732 122.0 218014 1.3724 0.7020
1.3724 123.0 219801 1.3724 0.7025
1.3715 124.0 221588 1.3572 0.7083
1.3713 125.0 223375 1.3430 0.7101
1.369 126.0 225162 1.3457 0.7103
1.3687 127.0 226949 1.3457 0.7112
1.3698 128.0 228736 1.3649 0.7024
1.3683 129.0 230523 1.3422 0.7105
1.3668 130.0 232310 1.3531 0.7058
1.3675 131.0 234097 1.3607 0.7059
1.3661 132.0 235884 1.3553 0.7063
1.3642 133.0 237671 1.3541 0.7054
1.364 134.0 239458 1.3564 0.7080
1.3635 135.0 241245 1.3634 0.7055
1.3621 136.0 243032 1.3689 0.7043
1.3623 137.0 244819 1.3668 0.7087
1.3614 138.0 246606 1.3535 0.7061
1.3614 139.0 248393 1.3588 0.7086
1.3588 140.0 250180 1.3472 0.7091
1.3598 141.0 251967 1.3425 0.7085
1.3601 142.0 253754 1.3533 0.7099
1.3585 143.0 255541 1.3298 0.7112
1.3586 144.0 257328 1.3583 0.7036
1.3583 145.0 259115 1.3455 0.7081
1.3567 146.0 260902 1.3674 0.7045
1.3561 147.0 262689 1.3705 0.7056
1.3552 148.0 264476 1.3506 0.7080
1.3547 149.0 266263 1.3457 0.7120
1.3538 150.0 268050 1.3245 0.7100
1.3543 151.0 269837 1.3278 0.7110
1.3549 152.0 271624 1.3345 0.7103
1.3528 153.0 273411 1.3336 0.7089
1.3528 154.0 275198 1.3620 0.7075
1.3531 155.0 276985 1.3383 0.7120
1.3525 156.0 278772 1.3377 0.7091
1.3509 157.0 280559 1.3639 0.7054
1.3502 158.0 282346 1.3456 0.7091
1.351 159.0 284133 1.3359 0.7110
1.3493 160.0 285920 1.3361 0.7129
1.3497 161.0 287707 1.3350 0.7091
1.3484 162.0 289494 1.3582 0.7075
1.35 163.0 291281 1.3441 0.7100
1.3485 164.0 293068 1.3227 0.7093
1.3485 165.0 294855 1.3386 0.7118
1.3483 166.0 296642 1.3470 0.7087
1.3481 167.0 298429 1.3405 0.7087
1.3465 167.88 300000 1.3382 0.7062

Framework versions

  • Transformers 4.26.0
  • Pytorch 1.14.0a0+410ce96
  • Datasets 2.9.0
  • Tokenizers 0.13.2
Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train gokuls/distilbert_sa_pre-training-complete

Evaluation results