distilbert_sa_pre-training-complete
This model is a fine-tuned version of distilbert-base-uncased on the wikitext wikitext-103-raw-v1 dataset. It achieves the following results on the evaluation set:
- Loss: 1.3367
- Accuracy: 0.7119
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 64
- eval_batch_size: 64
- seed: 10
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 128
- total_eval_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- training_steps: 300000
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
1.8819 | 1.0 | 1787 | 1.6350 | 0.6570 |
1.7699 | 2.0 | 3574 | 1.5847 | 0.6664 |
1.7308 | 3.0 | 5361 | 1.5857 | 0.6678 |
1.7062 | 4.0 | 7148 | 1.5679 | 0.6684 |
1.6858 | 5.0 | 8935 | 1.5573 | 0.6689 |
1.6684 | 6.0 | 10722 | 1.5591 | 0.6687 |
1.6545 | 7.0 | 12509 | 1.5253 | 0.6756 |
1.6438 | 8.0 | 14296 | 1.5304 | 0.6748 |
1.631 | 9.0 | 16083 | 1.4976 | 0.6805 |
1.6236 | 10.0 | 17870 | 1.5153 | 0.6773 |
1.613 | 11.0 | 19657 | 1.5010 | 0.6786 |
1.6046 | 12.0 | 21444 | 1.5069 | 0.6818 |
1.5963 | 13.0 | 23231 | 1.4796 | 0.6805 |
1.5906 | 14.0 | 25018 | 1.4858 | 0.6809 |
1.5833 | 15.0 | 26805 | 1.4962 | 0.6806 |
1.5771 | 16.0 | 28592 | 1.4749 | 0.6826 |
1.5703 | 17.0 | 30379 | 1.4724 | 0.6874 |
1.5663 | 18.0 | 32166 | 1.4946 | 0.6827 |
1.5614 | 19.0 | 33953 | 1.4934 | 0.6798 |
1.5558 | 20.0 | 35740 | 1.4512 | 0.6893 |
1.5508 | 21.0 | 37527 | 1.4417 | 0.6921 |
1.5466 | 22.0 | 39314 | 1.4744 | 0.6837 |
1.5427 | 23.0 | 41101 | 1.4530 | 0.6904 |
1.5382 | 24.0 | 42888 | 1.4378 | 0.6900 |
1.5342 | 25.0 | 44675 | 1.4732 | 0.6820 |
1.5293 | 26.0 | 46462 | 1.4609 | 0.6865 |
1.5259 | 27.0 | 48249 | 1.4400 | 0.6942 |
1.5223 | 28.0 | 50036 | 1.4586 | 0.6879 |
1.5197 | 29.0 | 51823 | 1.4371 | 0.6906 |
1.5161 | 30.0 | 53610 | 1.4335 | 0.6908 |
1.5126 | 31.0 | 55397 | 1.4159 | 0.6978 |
1.5087 | 32.0 | 57184 | 1.4412 | 0.6858 |
1.5061 | 33.0 | 58971 | 1.4181 | 0.6958 |
1.5035 | 34.0 | 60758 | 1.4357 | 0.6903 |
1.5011 | 35.0 | 62545 | 1.4147 | 0.6914 |
1.4967 | 36.0 | 64332 | 1.4251 | 0.6931 |
1.4954 | 37.0 | 66119 | 1.4366 | 0.6910 |
1.4925 | 38.0 | 67906 | 1.4488 | 0.6896 |
1.4889 | 39.0 | 69693 | 1.4336 | 0.6907 |
1.4866 | 40.0 | 71480 | 1.4226 | 0.6945 |
1.4838 | 41.0 | 73267 | 1.4210 | 0.6951 |
1.4821 | 42.0 | 75054 | 1.4081 | 0.6944 |
1.4792 | 43.0 | 76841 | 1.4496 | 0.6895 |
1.4778 | 44.0 | 78628 | 1.4263 | 0.6925 |
1.4754 | 45.0 | 80415 | 1.3995 | 0.6982 |
1.4736 | 46.0 | 82202 | 1.3958 | 0.6992 |
1.4702 | 47.0 | 83989 | 1.4073 | 0.6989 |
1.4683 | 48.0 | 85776 | 1.3917 | 0.6996 |
1.4663 | 49.0 | 87563 | 1.4109 | 0.6956 |
1.4648 | 50.0 | 89350 | 1.4003 | 0.6991 |
1.4619 | 51.0 | 91137 | 1.3847 | 0.7040 |
1.4599 | 52.0 | 92924 | 1.4323 | 0.6936 |
1.4596 | 53.0 | 94711 | 1.3935 | 0.6986 |
1.4567 | 54.0 | 96498 | 1.4122 | 0.6974 |
1.455 | 55.0 | 98285 | 1.3975 | 0.6977 |
1.4533 | 56.0 | 100072 | 1.3907 | 0.7004 |
1.4513 | 57.0 | 101859 | 1.3957 | 0.6997 |
1.45 | 58.0 | 103646 | 1.3907 | 0.7001 |
1.4466 | 59.0 | 105433 | 1.4063 | 0.6988 |
1.4455 | 60.0 | 107220 | 1.3600 | 0.7036 |
1.4439 | 61.0 | 109007 | 1.3941 | 0.7015 |
1.4432 | 62.0 | 110794 | 1.3854 | 0.7016 |
1.4404 | 63.0 | 112581 | 1.4080 | 0.6973 |
1.4397 | 64.0 | 114368 | 1.3924 | 0.7011 |
1.4366 | 65.0 | 116155 | 1.3872 | 0.7049 |
1.4358 | 66.0 | 117942 | 1.3924 | 0.7010 |
1.4339 | 67.0 | 119729 | 1.3895 | 0.6989 |
1.4329 | 68.0 | 121516 | 1.4007 | 0.7002 |
1.4319 | 69.0 | 123303 | 1.3672 | 0.7047 |
1.4287 | 70.0 | 125090 | 1.3892 | 0.7021 |
1.4284 | 71.0 | 126877 | 1.3674 | 0.7016 |
1.4265 | 72.0 | 128664 | 1.3983 | 0.6962 |
1.4254 | 73.0 | 130451 | 1.3665 | 0.7006 |
1.4231 | 74.0 | 132238 | 1.3852 | 0.7003 |
1.4228 | 75.0 | 134025 | 1.4328 | 0.6945 |
1.4217 | 76.0 | 135812 | 1.3831 | 0.7031 |
1.4198 | 77.0 | 137599 | 1.3897 | 0.7017 |
1.4183 | 78.0 | 139386 | 1.3790 | 0.7027 |
1.417 | 79.0 | 141173 | 1.3697 | 0.7063 |
1.4179 | 80.0 | 142960 | 1.3805 | 0.7008 |
1.414 | 81.0 | 144747 | 1.3579 | 0.7051 |
1.4132 | 82.0 | 146534 | 1.3727 | 0.7054 |
1.4112 | 83.0 | 148321 | 1.3763 | 0.7014 |
1.412 | 84.0 | 150108 | 1.3806 | 0.7051 |
1.4103 | 85.0 | 151895 | 1.3889 | 0.7022 |
1.4082 | 86.0 | 153682 | 1.3886 | 0.6993 |
1.4068 | 87.0 | 155469 | 1.3721 | 0.7025 |
1.4068 | 88.0 | 157256 | 1.3589 | 0.7050 |
1.4043 | 89.0 | 159043 | 1.3599 | 0.7043 |
1.4042 | 90.0 | 160830 | 1.3675 | 0.7055 |
1.4033 | 91.0 | 162617 | 1.3720 | 0.7031 |
1.402 | 92.0 | 164404 | 1.3506 | 0.7067 |
1.4004 | 93.0 | 166191 | 1.3833 | 0.7017 |
1.4001 | 94.0 | 167978 | 1.3734 | 0.7021 |
1.398 | 95.0 | 169765 | 1.3792 | 0.7037 |
1.3983 | 96.0 | 171552 | 1.3676 | 0.7050 |
1.3966 | 97.0 | 173339 | 1.3888 | 0.7027 |
1.3953 | 98.0 | 175126 | 1.3928 | 0.7004 |
1.3934 | 99.0 | 176913 | 1.3824 | 0.7011 |
1.3928 | 100.0 | 178700 | 1.3829 | 0.7040 |
1.3919 | 101.0 | 180487 | 1.3534 | 0.7046 |
1.3902 | 102.0 | 182274 | 1.3559 | 0.7048 |
1.3896 | 103.0 | 184061 | 1.3682 | 0.7045 |
1.3895 | 104.0 | 185848 | 1.3643 | 0.7053 |
1.3884 | 105.0 | 187635 | 1.3854 | 0.7018 |
1.3878 | 106.0 | 189422 | 1.3702 | 0.7038 |
1.3865 | 107.0 | 191209 | 1.3530 | 0.7056 |
1.3842 | 108.0 | 192996 | 1.3804 | 0.7008 |
1.3836 | 109.0 | 194783 | 1.3508 | 0.7073 |
1.3831 | 110.0 | 196570 | 1.3821 | 0.7002 |
1.3828 | 111.0 | 198357 | 1.3552 | 0.7063 |
1.382 | 112.0 | 200144 | 1.3567 | 0.7090 |
1.3806 | 113.0 | 201931 | 1.3915 | 0.7022 |
1.3802 | 114.0 | 203718 | 1.3661 | 0.7098 |
1.3789 | 115.0 | 205505 | 1.3586 | 0.7071 |
1.3789 | 116.0 | 207292 | 1.3643 | 0.7049 |
1.3782 | 117.0 | 209079 | 1.3716 | 0.7037 |
1.3763 | 118.0 | 210866 | 1.3476 | 0.7077 |
1.3762 | 119.0 | 212653 | 1.3519 | 0.7067 |
1.3741 | 120.0 | 214440 | 1.3736 | 0.7027 |
1.3741 | 121.0 | 216227 | 1.3395 | 0.7073 |
1.3732 | 122.0 | 218014 | 1.3724 | 0.7020 |
1.3724 | 123.0 | 219801 | 1.3724 | 0.7025 |
1.3715 | 124.0 | 221588 | 1.3572 | 0.7083 |
1.3713 | 125.0 | 223375 | 1.3430 | 0.7101 |
1.369 | 126.0 | 225162 | 1.3457 | 0.7103 |
1.3687 | 127.0 | 226949 | 1.3457 | 0.7112 |
1.3698 | 128.0 | 228736 | 1.3649 | 0.7024 |
1.3683 | 129.0 | 230523 | 1.3422 | 0.7105 |
1.3668 | 130.0 | 232310 | 1.3531 | 0.7058 |
1.3675 | 131.0 | 234097 | 1.3607 | 0.7059 |
1.3661 | 132.0 | 235884 | 1.3553 | 0.7063 |
1.3642 | 133.0 | 237671 | 1.3541 | 0.7054 |
1.364 | 134.0 | 239458 | 1.3564 | 0.7080 |
1.3635 | 135.0 | 241245 | 1.3634 | 0.7055 |
1.3621 | 136.0 | 243032 | 1.3689 | 0.7043 |
1.3623 | 137.0 | 244819 | 1.3668 | 0.7087 |
1.3614 | 138.0 | 246606 | 1.3535 | 0.7061 |
1.3614 | 139.0 | 248393 | 1.3588 | 0.7086 |
1.3588 | 140.0 | 250180 | 1.3472 | 0.7091 |
1.3598 | 141.0 | 251967 | 1.3425 | 0.7085 |
1.3601 | 142.0 | 253754 | 1.3533 | 0.7099 |
1.3585 | 143.0 | 255541 | 1.3298 | 0.7112 |
1.3586 | 144.0 | 257328 | 1.3583 | 0.7036 |
1.3583 | 145.0 | 259115 | 1.3455 | 0.7081 |
1.3567 | 146.0 | 260902 | 1.3674 | 0.7045 |
1.3561 | 147.0 | 262689 | 1.3705 | 0.7056 |
1.3552 | 148.0 | 264476 | 1.3506 | 0.7080 |
1.3547 | 149.0 | 266263 | 1.3457 | 0.7120 |
1.3538 | 150.0 | 268050 | 1.3245 | 0.7100 |
1.3543 | 151.0 | 269837 | 1.3278 | 0.7110 |
1.3549 | 152.0 | 271624 | 1.3345 | 0.7103 |
1.3528 | 153.0 | 273411 | 1.3336 | 0.7089 |
1.3528 | 154.0 | 275198 | 1.3620 | 0.7075 |
1.3531 | 155.0 | 276985 | 1.3383 | 0.7120 |
1.3525 | 156.0 | 278772 | 1.3377 | 0.7091 |
1.3509 | 157.0 | 280559 | 1.3639 | 0.7054 |
1.3502 | 158.0 | 282346 | 1.3456 | 0.7091 |
1.351 | 159.0 | 284133 | 1.3359 | 0.7110 |
1.3493 | 160.0 | 285920 | 1.3361 | 0.7129 |
1.3497 | 161.0 | 287707 | 1.3350 | 0.7091 |
1.3484 | 162.0 | 289494 | 1.3582 | 0.7075 |
1.35 | 163.0 | 291281 | 1.3441 | 0.7100 |
1.3485 | 164.0 | 293068 | 1.3227 | 0.7093 |
1.3485 | 165.0 | 294855 | 1.3386 | 0.7118 |
1.3483 | 166.0 | 296642 | 1.3470 | 0.7087 |
1.3481 | 167.0 | 298429 | 1.3405 | 0.7087 |
1.3465 | 167.88 | 300000 | 1.3382 | 0.7062 |
Framework versions
- Transformers 4.26.0
- Pytorch 1.14.0a0+410ce96
- Datasets 2.9.0
- Tokenizers 0.13.2
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train gokuls/distilbert_sa_pre-training-complete
Evaluation results
- Accuracy on wikitext wikitext-103-raw-v1validation set self-reported0.712