Edit model card

reuters-gpt2-text-gen

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.3505

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0.8 2 5.1325
No log 2.0 5 4.5158
No log 2.8 7 4.3926
3.2046 4.0 10 4.1889
3.2046 4.8 12 4.1255
3.2046 6.0 15 4.0377
3.2046 6.8 17 4.0633
2.3854 8.0 20 4.0414
2.3854 8.8 22 4.0933
2.3854 10.0 25 4.0976
2.3854 10.8 27 4.1448
2.0076 12.0 30 4.2108
2.0076 12.8 32 4.2002
2.0076 14.0 35 4.2741
2.0076 14.8 37 4.2844
1.7505 16.0 40 4.3736
1.7505 16.8 42 4.3574
1.7505 18.0 45 4.4569
1.7505 18.8 47 4.4581
1.5509 20.0 50 4.4699
1.5509 20.8 52 4.5080
1.5509 22.0 55 4.5407
1.5509 22.8 57 4.6452
1.3673 24.0 60 4.5325
1.3673 24.8 62 4.6152
1.3673 26.0 65 4.7127
1.3673 26.8 67 4.6173
1.2093 28.0 70 4.6912
1.2093 28.8 72 4.7465
1.2093 30.0 75 4.7689
1.2093 30.8 77 4.7705
1.0458 32.0 80 4.8648
1.0458 32.8 82 4.9239
1.0458 34.0 85 4.9503
1.0458 34.8 87 4.8597
0.9114 36.0 90 4.9523
0.9114 36.8 92 4.9581
0.9114 38.0 95 5.0170
0.9114 38.8 97 4.9739
0.7897 40.0 100 4.9779
0.7897 40.8 102 4.9746
0.7897 42.0 105 5.1164
0.7897 42.8 107 5.0466
0.688 44.0 110 5.1557
0.688 44.8 112 5.1215
0.688 46.0 115 5.1176
0.688 46.8 117 5.1375
0.6066 48.0 120 5.1657
0.6066 48.8 122 5.1599
0.6066 50.0 125 5.1915
0.6066 50.8 127 5.1978
0.5272 52.0 130 5.2156
0.5272 52.8 132 5.2771
0.5272 54.0 135 5.2110
0.5272 54.8 137 5.2720
0.4696 56.0 140 5.2585
0.4696 56.8 142 5.2798
0.4696 58.0 145 5.2785
0.4696 58.8 147 5.2969
0.424 60.0 150 5.3045
0.424 60.8 152 5.3076
0.424 62.0 155 5.3178
0.424 62.8 157 5.3264
0.3941 64.0 160 5.3031
0.3941 64.8 162 5.3250
0.3941 66.0 165 5.3291
0.3941 66.8 167 5.3288
0.3715 68.0 170 5.3393
0.3715 68.8 172 5.3485
0.3715 70.0 175 5.3370
0.3715 70.8 177 5.3340
0.3608 72.0 180 5.3379
0.3608 72.8 182 5.3413
0.3608 74.0 185 5.3434
0.3608 74.8 187 5.3471
0.351 76.0 190 5.3487
0.351 76.8 192 5.3499
0.351 78.0 195 5.3504
0.351 78.8 197 5.3505
0.3516 80.0 200 5.3505

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
2
Safetensors
Model size
126M params
Tensor type
F32
·

Finetuned from