phi-2-gpo-renew2-b0.001-log-i0

This model is a fine-tuned version of lole25/phi-2-sft-lora-ultrachat on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0367
  • Rewards/chosen: -0.0859
  • Rewards/rejected: -0.1297
  • Rewards/accuracies: 0.6335
  • Rewards/margins: 0.0439
  • Logps/rejected: -373.5459
  • Logps/chosen: -363.4243
  • Logits/rejected: 0.0915
  • Logits/chosen: 0.0487

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.066 0.03 100 0.0537 -0.0000 -0.0001 0.4725 0.0000 -243.8812 -277.5782 1.0637 0.9712
0.0611 0.05 200 0.0535 0.0003 -0.0002 0.5780 0.0005 -243.9921 -277.2496 1.0643 0.9716
0.0609 0.08 300 0.0529 0.0015 -0.0005 0.6165 0.0020 -244.3336 -276.0178 1.0636 0.9689
0.0513 0.1 400 0.0511 -0.0031 -0.0095 0.6150 0.0064 -253.2858 -280.6138 0.9583 0.8601
0.0501 0.13 500 0.0475 -0.0293 -0.0455 0.6050 0.0162 -289.3190 -306.8101 0.5770 0.4970
0.0508 0.16 600 0.0449 -0.0439 -0.0691 0.6055 0.0252 -312.9566 -321.4783 0.3282 0.2749
0.0421 0.18 700 0.0437 -0.0501 -0.0791 0.6055 0.0290 -322.8759 -327.6276 0.3240 0.2708
0.0437 0.21 800 0.0428 -0.0468 -0.0742 0.6005 0.0274 -318.0196 -324.3805 0.3805 0.3236
0.0387 0.24 900 0.0423 -0.0603 -0.0976 0.6055 0.0373 -341.3827 -337.8515 0.2503 0.1997
0.0469 0.26 1000 0.0410 -0.0415 -0.0745 0.6120 0.0330 -318.2856 -319.0327 0.3303 0.2683
0.0405 0.29 1100 0.0413 -0.0604 -0.0953 0.6065 0.0350 -339.1555 -337.9239 0.3569 0.3022
0.0532 0.31 1200 0.0414 -0.0616 -0.1042 0.6150 0.0426 -347.9869 -339.1231 0.1742 0.1261
0.0421 0.34 1300 0.0401 -0.0362 -0.0677 0.6240 0.0316 -311.5635 -313.6982 0.3279 0.2688
0.0454 0.37 1400 0.0401 -0.0665 -0.1024 0.6130 0.0359 -346.2302 -344.0237 0.2565 0.2034
0.03 0.39 1500 0.0394 -0.0809 -0.1233 0.6185 0.0424 -367.0958 -358.4021 0.2512 0.1958
0.0455 0.42 1600 0.0390 -0.0528 -0.0864 0.6220 0.0336 -330.2539 -330.3630 0.3432 0.2802
0.0444 0.44 1700 0.0383 -0.0576 -0.0957 0.6215 0.0381 -339.5015 -335.1629 0.1956 0.1433
0.0411 0.47 1800 0.0391 -0.0864 -0.1297 0.6165 0.0433 -373.5191 -363.9651 0.1143 0.0721
0.0486 0.5 1900 0.0382 -0.0792 -0.1204 0.6260 0.0412 -364.1853 -356.7109 0.1764 0.1298
0.0378 0.52 2000 0.0378 -0.0642 -0.1013 0.6290 0.0371 -345.1359 -341.7246 0.1294 0.0808
0.0316 0.55 2100 0.0375 -0.0770 -0.1185 0.6275 0.0414 -362.2671 -354.5952 0.0687 0.0245
0.0375 0.58 2200 0.0376 -0.0825 -0.1250 0.6280 0.0425 -368.8188 -360.0626 0.0391 0.0007
0.0344 0.6 2300 0.0376 -0.0705 -0.1082 0.6315 0.0377 -351.9891 -348.0063 0.1002 0.0554
0.0393 0.63 2400 0.0374 -0.0839 -0.1244 0.6330 0.0404 -368.2057 -361.4958 0.0124 -0.0271
0.0501 0.65 2500 0.0373 -0.0970 -0.1420 0.6265 0.0450 -385.8456 -374.5688 0.0053 -0.0307
0.03 0.68 2600 0.0372 -0.0948 -0.1408 0.6280 0.0460 -384.5748 -372.3464 0.0325 -0.0064
0.0445 0.71 2700 0.0372 -0.0927 -0.1378 0.6255 0.0450 -381.6031 -370.2887 0.0394 -0.0008
0.0359 0.73 2800 0.0369 -0.0822 -0.1244 0.6375 0.0422 -368.1677 -359.7133 0.0926 0.0476
0.0454 0.76 2900 0.0368 -0.0861 -0.1308 0.6340 0.0447 -374.6195 -363.6591 0.0788 0.0362
0.0422 0.79 3000 0.0368 -0.0872 -0.1317 0.6350 0.0445 -375.5086 -364.7430 0.0778 0.0354
0.0401 0.81 3100 0.0368 -0.0844 -0.1284 0.6350 0.0440 -372.1985 -361.9238 0.0778 0.0345
0.0455 0.84 3200 0.0368 -0.0842 -0.1275 0.6335 0.0434 -371.3240 -361.7043 0.0871 0.0436
0.0537 0.86 3300 0.0368 -0.0820 -0.1248 0.6350 0.0428 -368.5755 -359.5146 0.0936 0.0492
0.0415 0.89 3400 0.0367 -0.0845 -0.1281 0.6365 0.0436 -371.9387 -362.0815 0.0925 0.0492
0.0399 0.92 3500 0.0367 -0.0853 -0.1290 0.6325 0.0437 -372.8227 -362.8265 0.0937 0.0507
0.0386 0.94 3600 0.0367 -0.0855 -0.1294 0.6330 0.0438 -373.1803 -363.0746 0.0909 0.0479
0.0372 0.97 3700 0.0367 -0.0859 -0.1297 0.6375 0.0438 -373.5262 -363.4134 0.0910 0.0480
0.033 0.99 3800 0.0367 -0.0858 -0.1297 0.6325 0.0439 -373.5426 -363.3738 0.0911 0.0481

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for DUAL-GPO/phi-2-gpo-renew2-b0.001-log-i0

Base model

microsoft/phi-2
Adapter
(832)
this model

Dataset used to train DUAL-GPO/phi-2-gpo-renew2-b0.001-log-i0