Edit model card

otu_gpt

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.6030

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 256
  • eval_batch_size: 128
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 2048
  • total_eval_batch_size: 1024
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 30
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
5.413 1.0 6430 5.3753
5.1384 2.0 12860 5.1224
4.996 3.0 19290 4.9950
4.9047 4.0 25720 4.9112
4.8292 5.0 32150 4.8572
4.7709 6.0 38580 4.8168
4.7345 7.0 45010 4.7872
4.6996 8.0 51440 4.7637
4.6509 9.0 57870 4.7396
4.6326 10.0 64300 4.7248
4.6049 11.0 70730 4.7104
4.5894 12.0 77160 4.6994
4.5574 13.0 83590 4.6868
4.5415 14.0 90020 4.6758
4.5283 15.0 96450 4.6676
4.4993 16.0 102880 4.6605
4.486 17.0 109310 4.6532
4.4675 18.0 115740 4.6467
4.4588 19.0 122170 4.6410
4.4402 20.0 128600 4.6347
4.4182 21.0 135030 4.6292
4.4031 22.0 141460 4.6262
4.3857 23.0 147890 4.6200
4.3726 24.0 154320 4.6150
4.3575 25.0 160750 4.6130
4.3369 26.0 167180 4.6102
4.3106 27.0 173610 4.6064
4.3068 28.0 180040 4.6044
4.2803 29.0 186470 4.6026
4.268 30.0 192900 4.6030

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
0
Safetensors
Model size
109M params
Tensor type
F32
·

Finetuned from