Edit model card

pretraining7

This model is a fine-tuned version of gpt2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9516

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 10
  • total_train_batch_size: 320
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_steps: 250
  • training_steps: 1500
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
9.1748 0.1830 50 7.4163
6.7378 0.3660 100 6.2275
5.9659 0.5490 150 5.6559
5.4898 0.7321 200 5.2576
5.1248 0.9151 250 4.9392
4.7951 1.0981 300 4.6297
4.4843 1.2811 350 4.3591
4.2178 1.4641 400 4.0591
3.9729 1.6471 450 3.7843
3.766 1.8302 500 3.6225
3.6046 2.0132 550 3.5064
3.41 2.1962 600 3.4262
3.3702 2.3792 650 3.3577
3.309 2.5622 700 3.3027
3.2562 2.7452 750 3.2583
3.2027 2.9283 800 3.2192
3.1139 3.1113 850 3.1779
3.0442 3.2943 900 3.1549
3.0144 3.4773 950 3.1266
3.0016 3.6603 1000 3.0997
3.0001 3.8433 1050 3.0770
2.9655 4.0264 1100 3.0554
2.8328 4.2094 1150 3.0422
2.8343 4.3924 1200 3.0261
2.8266 4.5754 1250 3.0105
2.8236 4.7584 1300 2.9962
2.8194 4.9414 1350 2.9807
2.7161 5.1245 1400 2.9717
2.6842 5.3075 1450 2.9632
2.6898 5.4905 1500 2.9516

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
124M params
Tensor type
F32
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.