Edit model card

gpt-imdb-ipo_annealing

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 125.6974
  • Rewards/chosen: -0.0343
  • Rewards/rejected: -0.1277
  • Rewards/accuracies: 0.875
  • Rewards/margins: 0.0934
  • Logps/rejected: -267.1282
  • Logps/chosen: -236.1897
  • Logits/rejected: -31.3501
  • Logits/chosen: -31.5916

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 24
  • eval_batch_size: 24
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 150
  • training_steps: 7197

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
16.3187 0.21 500 34.0876 0.1161 -0.1126 0.5292 0.2287 -263.8062 -235.1407 -33.1877 -33.4371
5.5155 0.42 1000 13.0423 -0.1485 -0.3812 0.5042 0.2327 -264.1273 -235.4375 -35.2608 -35.4541
10.2532 0.63 1500 18.5157 -0.4407 -0.5471 0.5458 0.1064 -264.3746 -235.8205 -34.2230 -34.4246
6.755 0.83 2000 28.1593 -0.7791 -0.8052 0.5917 0.0261 -264.7961 -236.3400 -33.6119 -33.8069
9.4126 1.04 2500 9.2406 -0.8733 -1.2564 0.6229 0.3831 -265.6003 -236.5962 -31.9471 -32.0700
8.5908 1.25 3000 12.4967 -0.6700 -1.0163 0.6167 0.3462 -265.4156 -236.4061 -31.6914 -31.8443
19.5217 1.46 3500 6.8889 -0.0720 -0.4689 0.6854 0.3969 -264.5895 -235.4041 -32.1300 -32.2692
6.9195 1.67 4000 4.2435 -0.5324 -0.9335 0.7021 0.4012 -265.7609 -236.4489 -31.8342 -31.9606
4.6993 1.88 4500 5.0987 -0.2002 -0.6179 0.7521 0.4177 -265.3070 -235.7907 -31.6301 -31.7617
2.7896 2.08 5000 2.7344 -0.2390 -0.5589 0.7500 0.3199 -265.4754 -236.0307 -31.9650 -32.1009
3.2262 2.29 5500 3.0584 -0.1936 -0.5168 0.8083 0.3231 -265.8080 -236.0606 -31.6585 -31.8243
4.1965 2.5 6000 4.2350 -0.1555 -0.4440 0.8417 0.2884 -266.2272 -236.1557 -31.6484 -31.8344
15.1482 2.71 6500 10.8174 -0.0932 -0.3244 0.8667 0.2312 -266.7491 -236.1454 -31.4600 -31.6800
145.9251 2.92 7000 125.6974 -0.0343 -0.1277 0.875 0.0934 -267.1282 -236.1897 -31.3501 -31.5916

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.1
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
1
Safetensors
Model size
124M params
Tensor type
F32
·

Finetuned from