Edit model card

gpt-imdb-ipo-beta_0.3

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8601
  • Rewards/chosen: -0.2473
  • Rewards/rejected: -0.6141
  • Rewards/accuracies: 0.8271
  • Rewards/margins: 0.3668
  • Logps/rejected: -265.7321
  • Logps/chosen: -236.0896
  • Logits/rejected: -31.6527
  • Logits/chosen: -31.7977

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 24
  • eval_batch_size: 24
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 150
  • training_steps: 7197

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
5.822 0.21 500 19.5830 -0.0268 -0.3320 0.6708 0.3052 -264.7920 -235.3544 -33.5002 -33.8198
6.8677 0.42 1000 18.7557 -0.0552 -0.3293 0.5917 0.2741 -264.7829 -235.4492 -35.5852 -35.8178
12.3698 0.63 1500 36.0453 -0.1426 -0.5467 0.6771 0.4041 -265.5075 -235.7406 -34.3816 -34.5936
7.8347 0.83 2000 38.2624 -0.0799 -0.3485 0.6500 0.2687 -264.8470 -235.5314 -33.2874 -33.4310
9.184 1.04 2500 14.9546 -0.3389 -0.7127 0.6875 0.3739 -266.0610 -236.3948 -32.7912 -32.9463
11.1603 1.25 3000 15.5236 -0.0513 -0.3736 0.7000 0.3223 -264.9306 -235.4362 -33.3399 -33.5624
16.5516 1.46 3500 8.6118 -0.1177 -0.5526 0.7438 0.4349 -265.5274 -235.6576 -31.9816 -32.1630
5.2761 1.67 4000 5.2168 -0.1495 -0.5364 0.7417 0.3869 -265.4733 -235.7637 -32.2719 -32.3991
2.9326 1.88 4500 4.2332 -0.2284 -0.6043 0.7646 0.3759 -265.6996 -236.0266 -32.0240 -32.1547
2.9814 2.08 5000 3.3498 -0.2188 -0.6063 0.7792 0.3874 -265.7062 -235.9947 -31.8376 -31.9728
1.8651 2.29 5500 2.8900 -0.2624 -0.6313 0.7896 0.3688 -265.7895 -236.1400 -31.4502 -31.5973
4.5849 2.5 6000 2.2055 -0.2771 -0.6338 0.7833 0.3567 -265.7979 -236.1888 -31.5011 -31.6468
1.7322 2.71 6500 1.9194 -0.2534 -0.6145 0.8208 0.3611 -265.7336 -236.1099 -31.6632 -31.8054
1.1697 2.92 7000 1.8601 -0.2473 -0.6141 0.8271 0.3668 -265.7321 -236.0896 -31.6527 -31.7977

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.1
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
1
Safetensors
Model size
124M params
Tensor type
F32
·

Finetuned from