metadata
library_name: transformers
license: mit
base_model: gpt2
tags:
- generated_from_keras_callback
model-index:
- name: turkishElectrick-mini-model
results: []
turkishElectrick-mini-model
This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:
- Train Loss: 0.6456
- Validation Loss: 1.7437
- Epoch: 99
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'module': 'transformers.optimization_tf', 'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-05, 'decay_schedule_fn': {'module': 'keras.optimizers.schedules', 'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-05, 'decay_steps': -981, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'registered_name': None}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}, 'registered_name': 'WarmUp'}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
- training_precision: mixed_float16
Training results
Train Loss | Validation Loss | Epoch |
---|---|---|
7.8609 | 7.6497 | 0 |
7.4033 | 6.9102 | 1 |
6.7940 | 6.4910 | 2 |
6.4110 | 6.1667 | 3 |
6.1566 | 5.9352 | 4 |
5.9535 | 5.7224 | 5 |
5.7576 | 5.5135 | 6 |
5.5523 | 5.2730 | 7 |
5.3273 | 5.0157 | 8 |
5.0893 | 4.7472 | 9 |
4.8421 | 4.4614 | 10 |
4.5883 | 4.1934 | 11 |
4.3480 | 3.9637 | 12 |
4.1266 | 3.7447 | 13 |
3.9195 | 3.5359 | 14 |
3.7044 | 3.3124 | 15 |
3.5097 | 3.1111 | 16 |
3.3371 | 2.9532 | 17 |
3.1614 | 2.7941 | 18 |
3.0044 | 2.6662 | 19 |
2.8511 | 2.5749 | 20 |
2.7244 | 2.4281 | 21 |
2.5806 | 2.3450 | 22 |
2.4819 | 2.2632 | 23 |
2.3593 | 2.1921 | 24 |
2.2577 | 2.1169 | 25 |
2.1563 | 2.0540 | 26 |
2.0613 | 2.0063 | 27 |
1.9667 | 1.9627 | 28 |
1.8827 | 1.9393 | 29 |
1.8151 | 1.8864 | 30 |
1.7214 | 1.8717 | 31 |
1.6412 | 1.8502 | 32 |
1.5774 | 1.7942 | 33 |
1.5114 | 1.7909 | 34 |
1.4588 | 1.7749 | 35 |
1.4006 | 1.7770 | 36 |
1.3340 | 1.7404 | 37 |
1.2674 | 1.7468 | 38 |
1.2138 | 1.7298 | 39 |
1.1611 | 1.7218 | 40 |
1.1231 | 1.7275 | 41 |
1.0758 | 1.7187 | 42 |
1.0199 | 1.7249 | 43 |
0.9813 | 1.6946 | 44 |
0.9286 | 1.7022 | 45 |
0.8793 | 1.7378 | 46 |
0.8404 | 1.6809 | 47 |
0.8028 | 1.7204 | 48 |
0.7706 | 1.7212 | 49 |
0.7406 | 1.7010 | 50 |
0.6994 | 1.7265 | 51 |
0.6785 | 1.7437 | 52 |
0.6438 | 1.7437 | 53 |
0.6456 | 1.7437 | 54 |
0.6406 | 1.7437 | 55 |
0.6422 | 1.7437 | 56 |
0.6453 | 1.7437 | 57 |
0.6428 | 1.7437 | 58 |
0.6454 | 1.7437 | 59 |
0.6477 | 1.7437 | 60 |
0.6438 | 1.7437 | 61 |
0.6477 | 1.7437 | 62 |
0.6462 | 1.7437 | 63 |
0.6461 | 1.7437 | 64 |
0.6469 | 1.7437 | 65 |
0.6448 | 1.7437 | 66 |
0.6450 | 1.7437 | 67 |
0.6469 | 1.7437 | 68 |
0.6407 | 1.7437 | 69 |
0.6492 | 1.7437 | 70 |
0.6410 | 1.7437 | 71 |
0.6445 | 1.7437 | 72 |
0.6385 | 1.7437 | 73 |
0.6413 | 1.7437 | 74 |
0.6397 | 1.7437 | 75 |
0.6456 | 1.7437 | 76 |
0.6403 | 1.7437 | 77 |
0.6439 | 1.7437 | 78 |
0.6398 | 1.7437 | 79 |
0.6415 | 1.7437 | 80 |
0.6431 | 1.7437 | 81 |
0.6421 | 1.7437 | 82 |
0.6423 | 1.7437 | 83 |
0.6454 | 1.7437 | 84 |
0.6406 | 1.7437 | 85 |
0.6440 | 1.7437 | 86 |
0.6423 | 1.7437 | 87 |
0.6431 | 1.7437 | 88 |
0.6448 | 1.7437 | 89 |
0.6436 | 1.7437 | 90 |
0.6362 | 1.7437 | 91 |
0.6445 | 1.7437 | 92 |
0.6407 | 1.7437 | 93 |
0.6410 | 1.7437 | 94 |
0.6431 | 1.7437 | 95 |
0.6434 | 1.7437 | 96 |
0.6415 | 1.7437 | 97 |
0.6438 | 1.7437 | 98 |
0.6456 | 1.7437 | 99 |
Framework versions
- Transformers 4.44.2
- TensorFlow 2.17.0
- Datasets 3.0.0
- Tokenizers 0.19.1