danielkosyra's picture
End of training
e828941 verified
metadata
license: mit
base_model: gpt2
tags:
  - generated_from_trainer
model-index:
  - name: polynomial_1500_5e-4
    results: []

polynomial_1500_5e-4

This model is a fine-tuned version of gpt2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0244

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 10
  • total_train_batch_size: 320
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: polynomial
  • lr_scheduler_warmup_steps: 250
  • training_steps: 1500
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
9.2753 0.2175 50 7.6380
6.8812 0.4350 100 6.3560
6.0809 0.6525 150 5.7615
5.5903 0.8699 200 5.3682
5.2166 1.0874 250 5.0450
4.8887 1.3049 300 4.7443
4.6064 1.5224 350 4.4729
4.3711 1.7399 400 4.2030
4.1277 1.9574 450 3.9282
3.8099 2.1749 500 3.7320
3.6679 2.3923 550 3.6049
3.5513 2.6098 600 3.5031
3.4651 2.8273 650 3.4318
3.3797 3.0448 700 3.3644
3.2388 3.2623 750 3.3170
3.2017 3.4798 800 3.2766
3.173 3.6973 850 3.2426
3.143 3.9147 900 3.2054
3.0392 4.1322 950 3.1778
2.9809 4.3497 1000 3.1538
2.9564 4.5672 1050 3.1319
2.9576 4.7847 1100 3.1091
2.9375 5.0022 1150 3.0897
2.8071 5.2197 1200 3.0776
2.8145 5.4371 1250 3.0631
2.8042 5.6546 1300 3.0522
2.7851 5.8721 1350 3.0401
2.7462 6.0896 1400 3.0345
2.6936 6.3071 1450 3.0285
2.6984 6.5246 1500 3.0244

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1