Edit model card

distilbert_add_pre-training-complete

This model is a fine-tuned version of distilbert-base-uncased on the wikitext wikitext-103-raw-v1 dataset. It achieves the following results on the evaluation set:

  • Loss: 5.0239
  • Accuracy: 0.2307

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 10
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 300000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
6.295 1.0 3573 6.0701 0.1522
6.0482 2.0 7146 5.9533 0.1565
5.9799 3.0 10719 5.9008 0.1584
5.9378 4.0 14292 5.8997 0.1545
5.9057 5.0 17865 5.8905 0.1536
5.8811 6.0 21438 5.8646 0.1550
5.8617 7.0 25011 5.8322 0.1534
5.844 8.0 28584 5.8563 0.1523
5.8297 9.0 32157 5.8352 0.1548
5.8175 10.0 35730 5.8136 0.1558
5.8056 11.0 39303 5.8147 0.1526
5.7921 12.0 42876 5.8020 0.1548
5.7777 13.0 46449 5.7891 0.1545
5.7596 14.0 50022 5.7370 0.1587
5.7414 15.0 53595 5.7396 0.1604
5.7243 16.0 57168 5.7490 0.1564
5.6997 17.0 60741 5.7135 0.1561
5.6698 18.0 64314 5.6858 0.1620
5.6398 19.0 67887 5.6735 0.1644
5.6135 20.0 71460 5.6174 0.1681
5.5899 21.0 75033 5.6191 0.1684
5.5699 22.0 78606 5.5977 0.1669
5.5487 23.0 82179 5.6139 0.1669
5.529 24.0 85752 5.5272 0.1741
5.512 25.0 89325 5.5271 0.1727
5.4939 26.0 92898 5.5190 0.1721
5.4765 27.0 96471 5.4824 0.1770
5.4604 28.0 100044 5.5159 0.1747
5.4422 29.0 103617 5.4577 0.1807
5.4243 30.0 107190 5.4546 0.1772
5.408 31.0 110763 5.4297 0.1837
5.3915 32.0 114336 5.4089 0.1866
5.3766 33.0 117909 5.3996 0.1848
5.3594 34.0 121482 5.3974 0.1841
5.3451 35.0 125055 5.3718 0.1908
5.3294 36.0 128628 5.3706 0.1878
5.3155 37.0 132201 5.3677 0.1903
5.2996 38.0 135774 5.2970 0.1994
5.287 39.0 139347 5.3127 0.1977
5.2735 40.0 142920 5.3145 0.1955
5.26 41.0 146493 5.2985 0.2017
5.2487 42.0 150066 5.2661 0.2025
5.2362 43.0 153639 5.2712 0.2031
5.2248 44.0 157212 5.2452 0.2049
5.2115 45.0 160785 5.2325 0.2054
5.1998 46.0 164358 5.2233 0.2075
5.188 47.0 167931 5.1994 0.2118
5.1779 48.0 171504 5.2436 0.2069
5.1664 49.0 175077 5.2203 0.2129
5.1546 50.0 178650 5.1820 0.2134
5.1431 51.0 182223 5.2029 0.2122
5.133 52.0 185796 5.1458 0.2140
5.1226 53.0 189369 5.1757 0.2163
5.1138 54.0 192942 5.1380 0.2193
5.1046 55.0 196515 5.1498 0.2178
5.0984 56.0 200088 5.1094 0.2194
5.0907 57.0 203661 5.1354 0.2202
5.0812 58.0 207234 5.0662 0.2256
5.0748 59.0 210807 5.1163 0.2181
5.067 60.0 214380 5.1193 0.2199
5.0609 61.0 217953 5.0919 0.2224
5.0536 62.0 221526 5.0899 0.2239
5.0491 63.0 225099 5.1125 0.2224
5.0433 64.0 228672 5.0892 0.2226
5.0373 65.0 232245 5.0644 0.2260
5.032 66.0 235818 5.0623 0.2253
5.0283 67.0 239391 5.1004 0.2213
5.0223 68.0 242964 5.0573 0.2279
5.0184 69.0 246537 5.0488 0.2271
5.014 70.0 250110 5.0482 0.2280
5.0102 71.0 253683 5.0600 0.2269
5.0079 72.0 257256 5.0271 0.2279
5.0029 73.0 260829 5.0629 0.2267
4.9994 74.0 264402 5.0304 0.2297
4.9978 75.0 267975 5.0485 0.2269
4.9945 76.0 271548 5.0380 0.2306
4.9917 77.0 275121 5.0590 0.2265
4.9913 78.0 278694 5.0585 0.2262
4.987 79.0 282267 5.0339 0.2278
4.9862 80.0 285840 5.0214 0.2305
4.9841 81.0 289413 5.0393 0.2271
4.983 82.0 292986 5.0200 0.2298
4.9816 83.0 296559 5.0289 0.2300
4.9801 83.96 300000 4.9972 0.2332

Framework versions

  • Transformers 4.26.0
  • Pytorch 1.14.0a0+410ce96
  • Datasets 2.9.0
  • Tokenizers 0.13.2
Downloads last month
20

Dataset used to train gokuls/distilbert_add_pre-training-complete

Evaluation results