Edit model card

gpt2-wikitext2

This model is a fine-tuned version of openai-community/gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.6547

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 60

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 55 9.1458
No log 2.0 110 8.3471
No log 3.0 165 7.7884
No log 4.0 220 7.3751
No log 5.0 275 7.0487
No log 6.0 330 6.7857
No log 7.0 385 6.5840
No log 8.0 440 6.4196
No log 9.0 495 6.2584
7.3272 10.0 550 6.1628
7.3272 11.0 605 6.0521
7.3272 12.0 660 5.9861
7.3272 13.0 715 5.9223
7.3272 14.0 770 5.8760
7.3272 15.0 825 5.8246
7.3272 16.0 880 5.7813
7.3272 17.0 935 5.7663
7.3272 18.0 990 5.7275
5.2638 19.0 1045 5.7022
5.2638 20.0 1100 5.6905
5.2638 21.0 1155 5.6803
5.2638 22.0 1210 5.6740
5.2638 23.0 1265 5.6631
5.2638 24.0 1320 5.6461
5.2638 25.0 1375 5.6326
5.2638 26.0 1430 5.6280
5.2638 27.0 1485 5.6408
4.5099 28.0 1540 5.6194
4.5099 29.0 1595 5.6255
4.5099 30.0 1650 5.6218
4.5099 31.0 1705 5.6127
4.5099 32.0 1760 5.6140
4.5099 33.0 1815 5.6281
4.5099 34.0 1870 5.6305
4.5099 35.0 1925 5.6139
4.5099 36.0 1980 5.6331
4.0571 37.0 2035 5.6323
4.0571 38.0 2090 5.6137
4.0571 39.0 2145 5.6258
4.0571 40.0 2200 5.6322
4.0571 41.0 2255 5.6392
4.0571 42.0 2310 5.6308
4.0571 43.0 2365 5.6329
4.0571 44.0 2420 5.6373
4.0571 45.0 2475 5.6407
3.7638 46.0 2530 5.6489
3.7638 47.0 2585 5.6489
3.7638 48.0 2640 5.6445
3.7638 49.0 2695 5.6428
3.7638 50.0 2750 5.6425
3.7638 51.0 2805 5.6450
3.7638 52.0 2860 5.6566
3.7638 53.0 2915 5.6504
3.7638 54.0 2970 5.6494
3.5759 55.0 3025 5.6538
3.5759 56.0 3080 5.6555
3.5759 57.0 3135 5.6529
3.5759 58.0 3190 5.6567
3.5759 59.0 3245 5.6551
3.5759 60.0 3300 5.6547

Framework versions

  • Transformers 4.40.0
  • Pytorch 2.2.1
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
124M params
Tensor type
F32
·

Finetuned from