Edit model card

pretraining6

This model is a fine-tuned version of gpt2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9244

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 10
  • total_train_batch_size: 320
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_steps: 250
  • training_steps: 2500
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
7.5942 0.1830 50 6.9837
6.7139 0.3660 100 6.4890
6.3586 0.5490 150 6.1468
6.0068 0.7321 200 5.8162
5.7156 0.9151 250 5.5687
5.4902 1.0981 300 5.3574
5.2778 1.2811 350 5.1752
5.078 1.4641 400 4.9877
4.905 1.6471 450 4.8093
4.7396 1.8302 500 4.6300
4.5488 2.0132 550 4.4533
4.2909 2.1962 600 4.2386
4.1235 2.3792 650 3.9890
3.9081 2.5622 700 3.7933
3.7373 2.7452 750 3.6421
3.6011 2.9283 800 3.5265
3.4526 3.1113 850 3.4465
3.3523 3.2943 900 3.3867
3.2917 3.4773 950 3.3297
3.2536 3.6603 1000 3.2808
3.2277 3.8433 1050 3.2435
3.1699 4.0264 1100 3.1971
3.0158 4.2094 1150 3.1710
3.0104 4.3924 1200 3.1499
2.9946 4.5754 1250 3.1194
2.9814 4.7584 1300 3.0988
2.9686 4.9414 1350 3.0700
2.8425 5.1245 1400 3.0559
2.8039 5.3075 1450 3.0437
2.8121 5.4905 1500 3.0285
2.8078 5.6735 1550 3.0128
2.7996 5.8565 1600 2.9962
2.7607 6.0395 1650 2.9871
2.6212 6.2225 1700 2.9845
2.6638 6.4056 1750 2.9746
2.6603 6.5886 1800 2.9660
2.6674 6.7716 1850 2.9510
2.6741 6.9546 1900 2.9379
2.5313 7.1376 1950 2.9474
2.5107 7.3206 2000 2.9465
2.5358 7.5037 2050 2.9403
2.5552 7.6867 2100 2.9303
2.5691 7.8697 2150 2.9200
2.5008 8.0527 2200 2.9241
2.3855 8.2357 2250 2.9314
2.4215 8.4187 2300 2.9285
2.4488 8.6018 2350 2.9217
2.46 8.7848 2400 2.9110
2.468 8.9678 2450 2.9044
2.3004 9.1508 2500 2.9244

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
124M params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from