deneme_linux / README.md
denizzhansahin's picture
Upload model
473440b verified
metadata
license: mit
base_model: gpt2
tags:
  - generated_from_keras_callback
model-index:
  - name: deneme_linux
    results: []

deneme_linux

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Train Loss: 0.6315
  • Validation Loss: 7.0703
  • Epoch: 99

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'module': 'transformers.optimization_tf', 'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-05, 'decay_schedule_fn': {'module': 'keras.optimizers.schedules', 'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-05, 'decay_steps': -995, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'registered_name': None}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}, 'registered_name': 'WarmUp'}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
  • training_precision: float32

Training results

Train Loss Validation Loss Epoch
0.6331 7.0703 0
0.6319 7.0703 1
0.6283 7.0703 2
0.6276 7.0703 3
0.6295 7.0703 4
0.6356 7.0703 5
0.6282 7.0703 6
0.6287 7.0703 7
0.6309 7.0703 8
0.6291 7.0703 9
0.6320 7.0703 10
0.6284 7.0703 11
0.6333 7.0703 12
0.6302 7.0703 13
0.6346 7.0703 14
0.6285 7.0703 15
0.6248 7.0703 16
0.6317 7.0703 17
0.6291 7.0703 18
0.6305 7.0703 19
0.6321 7.0703 20
0.6317 7.0703 21
0.6274 7.0703 22
0.6283 7.0703 23
0.6359 7.0703 24
0.6334 7.0703 25
0.6306 7.0703 26
0.6375 7.0703 27
0.6267 7.0703 28
0.6349 7.0703 29
0.6298 7.0703 30
0.6314 7.0703 31
0.6347 7.0703 32
0.6284 7.0703 33
0.6300 7.0703 34
0.6287 7.0703 35
0.6337 7.0703 36
0.6348 7.0703 37
0.6297 7.0703 38
0.6376 7.0703 39
0.6340 7.0703 40
0.6311 7.0703 41
0.6327 7.0703 42
0.6343 7.0703 43
0.6297 7.0703 44
0.6316 7.0703 45
0.6302 7.0703 46
0.6324 7.0703 47
0.6355 7.0703 48
0.6278 7.0703 49
0.6324 7.0703 50
0.6332 7.0703 51
0.6294 7.0703 52
0.6348 7.0703 53
0.6288 7.0703 54
0.6332 7.0703 55
0.6334 7.0703 56
0.6302 7.0703 57
0.6287 7.0703 58
0.6274 7.0703 59
0.6272 7.0703 60
0.6264 7.0703 61
0.6298 7.0703 62
0.6275 7.0703 63
0.6315 7.0703 64
0.6293 7.0703 65
0.6325 7.0703 66
0.6277 7.0703 67
0.6292 7.0703 68
0.6254 7.0703 69
0.6351 7.0703 70
0.6362 7.0703 71
0.6312 7.0703 72
0.6307 7.0703 73
0.6260 7.0703 74
0.6289 7.0703 75
0.6333 7.0703 76
0.6259 7.0703 77
0.6270 7.0703 78
0.6300 7.0703 79
0.6321 7.0703 80
0.6352 7.0703 81
0.6283 7.0703 82
0.6377 7.0703 83
0.6291 7.0703 84
0.6263 7.0703 85
0.6302 7.0703 86
0.6336 7.0703 87
0.6326 7.0703 88
0.6365 7.0703 89
0.6328 7.0703 90
0.6281 7.0703 91
0.6360 7.0703 92
0.6347 7.0703 93
0.6318 7.0703 94
0.6334 7.0703 95
0.6349 7.0703 96
0.6274 7.0703 97
0.6266 7.0703 98
0.6315 7.0703 99

Framework versions

  • Transformers 4.38.2
  • TensorFlow 2.15.0
  • Datasets 2.18.0
  • Tokenizers 0.15.2