phi-2-gpo-renew2-i0 / README.md
BraylonDash's picture
End of training
b6a7236 verified
metadata
license: mit
library_name: peft
tags:
  - alignment-handbook
  - generated_from_trainer
  - trl
  - dpo
base_model: microsoft/phi-2
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: phi-2-gpo-renew2-i0
    results: []

phi-2-gpo-renew2-i0

This model is a fine-tuned version of lole25/phi-2-sft-lora-ultrachat on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0346
  • Rewards/chosen: -0.0264
  • Rewards/rejected: -0.0854
  • Rewards/accuracies: 0.6290
  • Rewards/margins: 0.0591
  • Logps/rejected: -252.3589
  • Logps/chosen: -280.1829
  • Logits/rejected: 1.0402
  • Logits/chosen: 0.9379

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0659 0.03 100 0.0536 -0.0002 -0.0008 0.4745 0.0005 -243.8923 -277.5683 1.0635 0.9711
0.0597 0.05 200 0.0518 0.0035 -0.0015 0.5880 0.0050 -243.9651 -277.1979 1.0617 0.9688
0.0564 0.08 300 0.0475 0.0104 -0.0081 0.6175 0.0185 -244.6272 -276.5096 1.0440 0.9499
0.0402 0.1 400 0.0438 0.0017 -0.0309 0.6325 0.0326 -246.9109 -277.3771 0.9932 0.8995
0.0421 0.13 500 0.0411 -0.0415 -0.0810 0.6195 0.0395 -251.9139 -281.6956 0.9295 0.8362
0.0439 0.16 600 0.0395 -0.0701 -0.1168 0.6175 0.0468 -255.5005 -284.5547 0.9520 0.8607
0.0363 0.18 700 0.0390 -0.0362 -0.0808 0.6310 0.0446 -251.8926 -281.1619 0.9895 0.8949
0.0402 0.21 800 0.0382 -0.0514 -0.1006 0.6220 0.0491 -253.8720 -282.6901 0.9937 0.9001
0.0381 0.24 900 0.0376 -0.0554 -0.1099 0.6315 0.0545 -254.8047 -283.0851 1.0465 0.9534
0.0421 0.26 1000 0.0374 -0.0408 -0.0930 0.6270 0.0522 -253.1114 -281.6268 1.0399 0.9448
0.0393 0.29 1100 0.0370 -0.0576 -0.1053 0.6285 0.0478 -254.3491 -283.3031 1.0557 0.9609
0.0533 0.31 1200 0.0369 -0.0606 -0.1154 0.6210 0.0548 -255.3544 -283.6022 1.0368 0.9417
0.0392 0.34 1300 0.0367 -0.0207 -0.0714 0.6120 0.0508 -250.9576 -279.6129 1.0634 0.9660
0.0432 0.37 1400 0.0367 -0.0146 -0.0629 0.6260 0.0483 -250.1082 -279.0112 1.0463 0.9482
0.0304 0.39 1500 0.0359 -0.0523 -0.1062 0.6360 0.0539 -254.4339 -282.7773 1.0471 0.9496
0.0436 0.42 1600 0.0359 -0.0322 -0.0845 0.6340 0.0522 -252.2616 -280.7699 1.0586 0.9585
0.0405 0.44 1700 0.0355 -0.0531 -0.1105 0.6335 0.0575 -254.8697 -282.8529 1.0312 0.9322
0.0352 0.47 1800 0.0354 -0.0369 -0.0956 0.6220 0.0586 -253.3721 -281.2394 1.0533 0.9539
0.0392 0.5 1900 0.0355 -0.0281 -0.0860 0.6210 0.0579 -252.4193 -280.3594 1.0498 0.9508
0.0368 0.52 2000 0.0354 -0.0231 -0.0770 0.6300 0.0539 -251.5159 -279.8615 1.0563 0.9577
0.0326 0.55 2100 0.0352 -0.0360 -0.0915 0.6300 0.0555 -252.9630 -281.1432 1.0751 0.9760
0.0368 0.58 2200 0.0352 -0.0391 -0.0965 0.6345 0.0574 -253.4691 -281.4595 1.0642 0.9640
0.0315 0.6 2300 0.0351 -0.0252 -0.0801 0.6330 0.0549 -251.8242 -280.0628 1.0685 0.9676
0.0341 0.63 2400 0.0352 -0.0240 -0.0803 0.6320 0.0563 -251.8426 -279.9447 1.0420 0.9405
0.0488 0.65 2500 0.0350 -0.0321 -0.0918 0.6340 0.0597 -252.9968 -280.7594 1.0394 0.9378
0.0279 0.68 2600 0.0349 -0.0383 -0.0996 0.6315 0.0613 -253.7721 -281.3765 1.0361 0.9350
0.0427 0.71 2700 0.0348 -0.0312 -0.0911 0.6310 0.0600 -252.9290 -280.6644 1.0336 0.9319
0.0331 0.73 2800 0.0349 -0.0291 -0.0872 0.6290 0.0581 -252.5369 -280.4611 1.0354 0.9335
0.0415 0.76 2900 0.0349 -0.0298 -0.0883 0.6315 0.0585 -252.6469 -280.5276 1.0248 0.9228
0.0404 0.79 3000 0.0349 -0.0268 -0.0859 0.6295 0.0590 -252.4009 -280.2291 1.0305 0.9277
0.0362 0.81 3100 0.0348 -0.0264 -0.0849 0.6305 0.0585 -252.3079 -280.1861 1.0296 0.9270
0.0412 0.84 3200 0.0348 -0.0274 -0.0861 0.6260 0.0587 -252.4237 -280.2876 1.0338 0.9313
0.0485 0.86 3300 0.0347 -0.0242 -0.0824 0.6270 0.0582 -252.0546 -279.9648 1.0359 0.9336
0.0376 0.89 3400 0.0346 -0.0264 -0.0854 0.6310 0.0590 -252.3589 -280.1902 1.0377 0.9354
0.0352 0.92 3500 0.0346 -0.0266 -0.0856 0.6260 0.0590 -252.3726 -280.2037 1.0418 0.9392
0.0379 0.94 3600 0.0347 -0.0263 -0.0852 0.6315 0.0589 -252.3377 -280.1781 1.0414 0.9390
0.0361 0.97 3700 0.0346 -0.0266 -0.0856 0.6310 0.0590 -252.3741 -280.2047 1.0399 0.9377
0.0298 0.99 3800 0.0347 -0.0263 -0.0850 0.6275 0.0587 -252.3201 -280.1767 1.0412 0.9387

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.2