130000 / README.md
Vas123's picture
End of training
89024b9 verified
|
raw
history blame
No virus
3.67 kB
metadata
license: mit
base_model: gpt2
tags:
  - generated_from_trainer
model-index:
  - name: '130000'
    results: []

130000

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.9987

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0.92 3 7.0396
No log 1.85 6 6.5398
No log 2.77 9 6.3337
6.6916 4.0 13 6.3694
6.6916 4.92 16 6.2945
6.6916 5.85 19 6.3184
6.1092 6.77 22 6.3726
6.1092 8.0 26 6.2948
6.1092 8.92 29 6.3374
6.5151 9.85 32 6.3641
6.5151 10.77 35 6.2335
6.5151 12.0 39 6.1965
5.998 12.92 42 6.0595
5.998 13.85 45 6.0374
5.998 14.77 48 6.0562
5.6623 16.0 52 6.0128
5.6623 16.92 55 5.9999
5.6623 17.85 58 6.0008
5.611 18.77 61 5.9992
5.611 20.0 65 6.0017
5.611 20.92 68 6.0005
5.5519 21.85 71 5.9962
5.5519 22.77 74 5.9964
5.5519 24.0 78 5.9975
5.5841 24.92 81 5.9974
5.5841 25.85 84 6.0000
5.5841 26.77 87 6.0019
5.5582 28.0 91 6.0014
5.5582 28.92 94 6.0016
5.5582 29.85 97 5.9987
5.591 30.77 100 5.9992
5.591 32.0 104 5.9986
5.591 32.92 107 5.9982
5.5638 33.85 110 5.9983
5.5638 34.77 113 5.9987
5.5638 36.0 117 5.9989
5.5683 36.92 120 5.9992
5.5683 37.85 123 5.9995
5.5683 38.77 126 5.9991
5.5628 40.0 130 5.9992
5.5628 40.92 133 5.9992
5.5628 41.85 136 5.9991
5.5628 42.77 139 5.9989
5.5683 44.0 143 5.9987
5.5683 44.92 146 5.9987
5.5683 45.85 149 5.9987
5.5534 46.15 150 5.9987

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2