Edit model card

EngTig/distilgpt2-finetuned-wikitext2

This model is a fine-tuned version of distilgpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Train Loss: 1.4784
  • Validation Loss: 4.7279
  • Epoch: 47

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
  • training_precision: float32

Training results

Train Loss Validation Loss Epoch
2.9937 3.8775 0
2.9426 3.8763 1
2.8926 3.8593 2
2.8445 3.8982 3
2.8090 3.9044 4
2.7511 3.9337 5
2.7140 3.9265 6
2.6655 3.9483 7
2.6443 3.9490 8
2.6153 3.9458 9
2.5699 3.9660 10
2.5262 3.9897 11
2.5002 4.0219 12
2.4636 4.0540 13
2.4327 4.0224 14
2.3945 4.0364 15
2.3661 4.0640 16
2.3319 4.0636 17
2.2992 4.0996 18
2.2712 4.0886 19
2.2377 4.1483 20
2.2054 4.1594 21
2.1658 4.1989 22
2.1444 4.1348 23
2.1129 4.1489 24
2.0953 4.2259 25
2.0546 4.2353 26
2.0281 4.3147 27
1.9927 4.2586 28
1.9698 4.3254 29
1.9373 4.3288 30
1.9159 4.3262 31
1.8750 4.3550 32
1.8480 4.3697 33
1.8215 4.4233 34
1.7874 4.4876 35
1.7685 4.5072 36
1.7433 4.4617 37
1.7085 4.5331 38
1.6839 4.5724 39
1.6643 4.5819 40
1.6224 4.6558 41
1.5981 4.5991 42
1.5788 4.6276 43
1.5532 4.6394 44
1.5164 4.6464 45
1.4998 4.6634 46
1.4784 4.7279 47

Framework versions

  • Transformers 4.38.2
  • TensorFlow 2.15.0
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
1

Finetuned from