Edit model card

polynomial_1450_8e-4

This model is a fine-tuned version of gpt2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8482

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0007
  • train_batch_size: 40
  • eval_batch_size: 40
  • seed: 42
  • gradient_accumulation_steps: 10
  • total_train_batch_size: 400
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: polynomial
  • lr_scheduler_warmup_steps: 250
  • training_steps: 1450
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
9.0241 0.2573 50 7.1979
6.5847 0.5147 100 6.0647
5.7978 0.7720 150 5.4606
5.2827 1.0293 200 5.0281
4.8587 1.2867 250 4.6639
4.5089 1.5440 300 4.3212
4.1969 1.8013 350 4.0343
3.8907 2.0587 400 3.7044
3.5806 2.3160 450 3.5024
3.4326 2.5733 500 3.3742
3.312 2.8307 550 3.2756
3.2081 3.0880 600 3.2094
3.0667 3.3453 650 3.1524
3.0244 3.6027 700 3.1068
3.0115 3.8600 750 3.0626
2.896 4.1173 800 3.0297
2.8202 4.3747 850 3.0073
2.8021 4.6320 900 2.9799
2.7938 4.8893 950 2.9512
2.7011 5.1467 1000 2.9363
2.6331 5.4040 1050 2.9229
2.6313 5.6613 1100 2.9034
2.6277 5.9187 1150 2.8887
2.5224 6.1760 1200 2.8811
2.4908 6.4334 1250 2.8728
2.4928 6.6907 1300 2.8609
2.4871 6.9480 1350 2.8522
2.4013 7.2054 1400 2.8514
2.3854 7.4627 1450 2.8482

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
124M params
Tensor type
F32
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.