otu_gpt / README.md
Dauka-transformers's picture
End of training
f4acddb verified
metadata
license: mit
base_model: gpt2
tags:
  - generated_from_trainer
model-index:
  - name: otu_gpt
    results: []

otu_gpt

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.6030

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 256
  • eval_batch_size: 128
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 2048
  • total_eval_batch_size: 1024
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 30
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
5.413 1.0 6430 5.3753
5.1384 2.0 12860 5.1224
4.996 3.0 19290 4.9950
4.9047 4.0 25720 4.9112
4.8292 5.0 32150 4.8572
4.7709 6.0 38580 4.8168
4.7345 7.0 45010 4.7872
4.6996 8.0 51440 4.7637
4.6509 9.0 57870 4.7396
4.6326 10.0 64300 4.7248
4.6049 11.0 70730 4.7104
4.5894 12.0 77160 4.6994
4.5574 13.0 83590 4.6868
4.5415 14.0 90020 4.6758
4.5283 15.0 96450 4.6676
4.4993 16.0 102880 4.6605
4.486 17.0 109310 4.6532
4.4675 18.0 115740 4.6467
4.4588 19.0 122170 4.6410
4.4402 20.0 128600 4.6347
4.4182 21.0 135030 4.6292
4.4031 22.0 141460 4.6262
4.3857 23.0 147890 4.6200
4.3726 24.0 154320 4.6150
4.3575 25.0 160750 4.6130
4.3369 26.0 167180 4.6102
4.3106 27.0 173610 4.6064
4.3068 28.0 180040 4.6044
4.2803 29.0 186470 4.6026
4.268 30.0 192900 4.6030

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2