Edit model card

polynomial_1450_7e-4_32b_w0.2

This model is a fine-tuned version of gpt2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8711

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0007
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 10
  • total_train_batch_size: 320
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: polynomial
  • lr_scheduler_warmup_steps: 250
  • training_steps: 1450
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
9.0501 0.2058 50 7.2516
6.6334 0.4117 100 6.1191
5.8403 0.6175 150 5.5171
5.347 0.8234 200 5.0809
4.9621 1.0292 250 4.7655
4.5909 1.2351 300 4.4418
4.3142 1.4409 350 4.1684
4.0577 1.6468 400 3.8857
3.7934 1.8526 450 3.6317
3.5603 2.0585 500 3.4786
3.3743 2.2643 550 3.3722
3.3003 2.4702 600 3.2932
3.2338 2.6760 650 3.2353
3.1788 2.8818 700 3.1763
3.0774 3.0877 750 3.1289
2.9735 3.2935 800 3.0953
2.9351 3.4994 850 3.0626
2.9367 3.7052 900 3.0310
2.9088 3.9111 950 3.0032
2.7944 4.1169 1000 2.9830
2.7402 4.3228 1050 2.9669
2.7293 4.5286 1100 2.9475
2.7184 4.7345 1150 2.9275
2.7029 4.9403 1200 2.9098
2.6065 5.1462 1250 2.9024
2.5699 5.3520 1300 2.8938
2.5511 5.5578 1350 2.8836
2.5503 5.7637 1400 2.8756
2.5435 5.9695 1450 2.8711

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
124M params
Tensor type
F32
·

Finetuned from