Edit model card

GPT2WaP

This model is a gpt2 model trained from scratch on the War and peace book. It achieves the following results on the evaluation set:

  • Loss: 9.0987
  • Perplexity: 8943.6289

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 512
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 40
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Perplexity
10.157 0.6897 10 9.2336 10235.7480
9.2581 1.3793 20 8.9452 7671.1870
8.8166 2.0690 30 9.4917 13248.7207
8.5094 2.7586 40 9.5417 13928.9434
8.0914 3.4483 50 9.5507 14054.4785
7.663 4.1379 60 9.4760 13043.2441
7.3275 4.8276 70 9.3510 11510.8203
6.9788 5.5172 80 9.0822 8797.7188
6.6639 6.2069 90 8.9803 7945.4014
6.3749 6.8966 100 8.6494 5706.8130
6.0702 7.5862 110 8.5696 5268.9268
5.9107 8.2759 120 8.3612 4277.6265
5.6724 8.9655 130 8.4294 4579.6484
5.5949 9.6552 140 8.4934 4882.4316
5.4904 10.3448 150 8.4683 4761.3862
5.3792 11.0345 160 8.4647 4744.5381
5.3091 11.7241 170 8.5767 5306.3535
5.233 12.4138 180 8.5257 5042.5068
5.2252 13.1034 190 8.5328 5078.8433
5.1445 13.7931 200 8.5871 5361.9390
5.0824 14.4828 210 8.5784 5315.4043
5.0272 15.1724 220 8.6434 5672.6934
4.979 15.8621 230 8.6836 5905.4277
4.924 16.5517 240 8.7112 6070.2261
4.9394 17.2414 250 8.7233 6144.3931
4.8663 17.9310 260 8.7411 6254.5234
4.8599 18.6207 270 8.7824 6518.7896
4.8572 19.3103 280 8.8338 6862.5586
4.8064 20.0 290 8.7774 6485.7441
4.746 20.6897 300 8.8458 6944.8892
4.7569 21.3793 310 8.8436 6930.1416
4.6954 22.0690 320 8.8618 7057.1084
4.7277 22.7586 330 8.8706 7119.4478
4.6432 23.4483 340 8.9084 7393.6138
4.6032 24.1379 350 8.9111 7413.5176
4.6198 24.8276 360 8.9526 7728.0210
4.5874 25.5172 370 8.9740 7895.1641
4.5455 26.2069 380 8.9365 7604.7129
4.5313 26.8966 390 8.9738 7893.2969
4.5297 27.5862 400 8.9659 7831.8110
4.5279 28.2759 410 8.9914 8034.0391
4.4974 28.9655 420 9.0293 8344.2529
4.4554 29.6552 430 9.0191 8259.1533
4.4651 30.3448 440 9.0236 8296.4531
4.4647 31.0345 450 9.0349 8391.1279
4.4668 31.7241 460 9.0530 8543.8340
4.4264 32.4138 470 9.0722 8709.4141
4.4008 33.1034 480 9.0876 8844.6104
4.3982 33.7931 490 9.0711 8700.4893
4.3846 34.4828 500 9.0894 8860.7441
4.3971 35.1724 510 9.0879 8847.6973
4.379 35.8621 520 9.0949 8909.6025
4.3696 36.5517 530 9.1097 9042.2295
4.3447 37.2414 540 9.1007 8961.6953
4.3796 37.9310 550 9.0869 8839.0781
4.364 38.6207 560 9.0987 8943.6289

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
124M params
Tensor type
F32
·

Finetuned from