Edit model card

distilgpt_new_0060

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Train Loss: 2.8691
  • Validation Loss: 2.7610
  • Epoch: 59

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
  • training_precision: float32

Training results

Train Loss Validation Loss Epoch
5.6632 4.5153 0
4.4292 4.0923 1
4.1169 3.8723 2
3.9326 3.7260 3
3.8026 3.6281 4
3.7045 3.5355 5
3.6254 3.4645 6
3.5604 3.4093 7
3.5048 3.3587 8
3.4569 3.3136 9
3.4155 3.2778 10
3.3791 3.2443 11
3.3470 3.2157 12
3.3183 3.1854 13
3.2922 3.1642 14
3.2685 3.1400 15
3.2467 3.1193 16
3.2267 3.1009 17
3.2078 3.0838 18
3.1904 3.0689 19
3.1739 3.0520 20
3.1584 3.0379 21
3.1438 3.0255 22
3.1300 3.0116 23
3.1168 2.9965 24
3.1044 2.9866 25
3.0925 2.9752 26
3.0812 2.9631 27
3.0704 2.9539 28
3.0601 2.9458 29
3.0502 2.9340 30
3.0408 2.9251 31
3.0317 2.9179 32
3.0230 2.9082 33
3.0147 2.9002 34
3.0065 2.8948 35
2.9987 2.8855 36
2.9911 2.8779 37
2.9838 2.8706 38
2.9767 2.8643 39
2.9698 2.8570 40
2.9632 2.8501 41
2.9567 2.8441 42
2.9505 2.8385 43
2.9445 2.8327 44
2.9385 2.8260 45
2.9329 2.8213 46
2.9272 2.8160 47
2.9217 2.8107 48
2.9162 2.8052 49
2.9110 2.8020 50
2.9060 2.7938 51
2.9010 2.7896 52
2.8962 2.7857 53
2.8913 2.7827 54
2.8866 2.7768 55
2.8821 2.7724 56
2.8776 2.7679 57
2.8733 2.7642 58
2.8691 2.7610 59

Framework versions

  • Transformers 4.20.1
  • TensorFlow 2.8.2
  • Datasets 2.3.2
  • Tokenizers 0.12.1
Downloads last month
6