Edit model card

gpt2-dpo

This model is a fine-tuned version of mNLP-project/gpt2-finetuned on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6350
  • Rewards/chosen: 1.6222
  • Rewards/rejected: 1.3204
  • Rewards/accuracies: 0.6496
  • Rewards/margins: 0.3018
  • Logps/rejected: -780.0735
  • Logps/chosen: -933.2262
  • Logits/rejected: -34.5449
  • Logits/chosen: -28.7838

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6286 0.9993 668 0.6350 1.6222 1.3204 0.6496 0.3018 -780.0735 -933.2262 -34.5449 -28.7838
0.6387 2.0 1337 0.6662 1.8546 1.5416 0.6302 0.3130 -777.8622 -930.9024 -34.5110 -28.7424
0.5643 2.9993 2005 0.6635 2.0534 1.6918 0.6396 0.3616 -776.3599 -928.9147 -34.5066 -28.7168
0.4487 4.0 2674 0.6677 2.2748 1.8809 0.6451 0.3940 -774.4694 -926.7002 -34.1409 -28.2530
0.3831 4.9993 3342 0.6783 2.4765 2.0527 0.6418 0.4238 -772.7513 -924.6838 -34.0051 -28.0668
0.352 6.0 4011 0.6782 2.4441 2.0097 0.6440 0.4344 -773.1808 -925.0074 -34.0868 -28.1418
0.3189 6.9993 4679 0.6840 2.2310 1.8303 0.6343 0.4008 -774.9752 -927.1384 -33.9525 -27.9466
0.3006 8.0 5348 0.6882 2.4339 1.9918 0.6388 0.4422 -773.3604 -925.1093 -33.7716 -27.7551
0.3152 8.9993 6016 0.6891 2.4920 2.0457 0.6407 0.4462 -772.8206 -924.5289 -33.6753 -27.6463
0.2752 9.9925 6680 0.6892 2.4562 2.0151 0.6410 0.4411 -773.1274 -924.8871 -33.6818 -27.6538

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
1,574
Safetensors
Model size
124M params
Tensor type
F32
·

Finetuned from