Edit model card

phi-2-gpo-renew2-i0

This model is a fine-tuned version of lole25/phi-2-sft-lora-ultrachat on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0391
  • Rewards/chosen: -0.0132
  • Rewards/rejected: -0.0427
  • Rewards/accuracies: 0.6330
  • Rewards/margins: 0.0295
  • Logps/rejected: -252.3540
  • Logps/chosen: -280.1870
  • Logits/rejected: 1.0400
  • Logits/chosen: 0.9376

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.0659 0.03 100 0.9711 1.0635 -277.5683 -243.8923 0.0536 0.4745 -0.0002 0.0005 -0.0008
0.0597 0.05 200 0.9688 1.0617 -277.1979 -243.9651 0.0518 0.5880 0.0035 0.0050 -0.0015
0.0564 0.08 300 0.9499 1.0440 -276.5096 -244.6272 0.0475 0.6175 0.0104 0.0185 -0.0081
0.0402 0.1 400 0.8995 0.9932 -277.3771 -246.9109 0.0438 0.6325 0.0017 0.0326 -0.0309
0.0421 0.13 500 0.8362 0.9295 -281.6956 -251.9139 0.0411 0.6195 -0.0415 0.0395 -0.0810
0.0439 0.16 600 0.8607 0.9520 -284.5547 -255.5005 0.0395 0.6175 -0.0701 0.0468 -0.1168
0.0363 0.18 700 0.8949 0.9895 -281.1619 -251.8926 0.0390 0.6310 -0.0362 0.0446 -0.0808
0.0402 0.21 800 0.9001 0.9937 -282.6901 -253.8720 0.0382 0.6220 -0.0514 0.0491 -0.1006
0.0381 0.24 900 0.9534 1.0465 -283.0851 -254.8047 0.0376 0.6315 -0.0554 0.0545 -0.1099
0.0421 0.26 1000 0.9448 1.0399 -281.6268 -253.1114 0.0374 0.6270 -0.0408 0.0522 -0.0930
0.0393 0.29 1100 0.9609 1.0557 -283.3031 -254.3491 0.0370 0.6285 -0.0576 0.0478 -0.1053
0.0533 0.31 1200 0.9417 1.0368 -283.6022 -255.3544 0.0369 0.6210 -0.0606 0.0548 -0.1154
0.0392 0.34 1300 0.9660 1.0634 -279.6129 -250.9576 0.0367 0.6120 -0.0207 0.0508 -0.0714
0.0432 0.37 1400 0.9482 1.0463 -279.0112 -250.1082 0.0367 0.6260 -0.0146 0.0483 -0.0629
0.0304 0.39 1500 0.9496 1.0471 -282.7773 -254.4339 0.0359 0.6360 -0.0523 0.0539 -0.1062
0.0436 0.42 1600 0.9585 1.0586 -280.7699 -252.2616 0.0359 0.6340 -0.0322 0.0522 -0.0845
0.0405 0.44 1700 0.9322 1.0312 -282.8529 -254.8697 0.0355 0.6335 -0.0531 0.0575 -0.1105
0.0352 0.47 1800 0.9539 1.0533 -281.2394 -253.3721 0.0354 0.6220 -0.0369 0.0586 -0.0956
0.0392 0.5 1900 0.9508 1.0498 -280.3594 -252.4193 0.0355 0.6210 -0.0281 0.0579 -0.0860
0.0368 0.52 2000 0.9577 1.0563 -279.8615 -251.5159 0.0354 0.6300 -0.0231 0.0539 -0.0770
0.0326 0.55 2100 0.9760 1.0751 -281.1432 -252.9630 0.0352 0.6300 -0.0360 0.0555 -0.0915
0.0368 0.58 2200 0.9640 1.0642 -281.4595 -253.4691 0.0352 0.6345 -0.0391 0.0574 -0.0965
0.0315 0.6 2300 0.9676 1.0685 -280.0628 -251.8242 0.0351 0.6330 -0.0252 0.0549 -0.0801
0.0341 0.63 2400 0.9405 1.0420 -279.9447 -251.8426 0.0352 0.6320 -0.0240 0.0563 -0.0803
0.0488 0.65 2500 0.9378 1.0394 -280.7594 -252.9968 0.0350 0.6340 -0.0321 0.0597 -0.0918
0.0279 0.68 2600 0.9350 1.0361 -281.3765 -253.7721 0.0349 0.6315 -0.0383 0.0613 -0.0996
0.0427 0.71 2700 0.9319 1.0336 -280.6644 -252.9290 0.0348 0.6310 -0.0312 0.0600 -0.0911
0.0331 0.73 2800 0.9335 1.0354 -280.4611 -252.5369 0.0349 0.6290 -0.0291 0.0581 -0.0872
0.0415 0.76 2900 0.9228 1.0248 -280.5276 -252.6469 0.0349 0.6315 -0.0298 0.0585 -0.0883
0.0404 0.79 3000 0.9277 1.0305 -280.2291 -252.4009 0.0349 0.6295 -0.0268 0.0590 -0.0859
0.0362 0.81 3100 0.9270 1.0296 -280.1861 -252.3079 0.0348 0.6305 -0.0264 0.0585 -0.0849
0.0412 0.84 3200 0.9313 1.0338 -280.2876 -252.4237 0.0348 0.6260 -0.0274 0.0587 -0.0861
0.0485 0.86 3300 0.9336 1.0359 -279.9648 -252.0546 0.0347 0.6270 -0.0242 0.0582 -0.0824
0.0376 0.89 3400 0.9354 1.0377 -280.1902 -252.3589 0.0346 0.6310 -0.0264 0.0590 -0.0854
0.0352 0.92 3500 0.9392 1.0418 -280.2037 -252.3726 0.0346 0.6260 -0.0266 0.0590 -0.0856
0.0379 0.94 3600 0.9390 1.0414 -280.1781 -252.3377 0.0347 0.6315 -0.0263 0.0589 -0.0852
0.0361 0.97 3700 0.9377 1.0399 -280.2047 -252.3741 0.0346 0.6310 -0.0266 0.0590 -0.0856
0.0298 0.99 3800 0.9387 1.0412 -280.1767 -252.3201 0.0347 0.6275 -0.0263 0.0587 -0.0850

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
8
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train DUAL-GPO-2/phi-2-gpo-renew2-i0