Edit model card

phi-2-gpo-renew2-b0.01-log-i0

This model is a fine-tuned version of lole25/phi-2-sft-lora-ultrachat on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6909
  • Rewards/chosen: -0.0288
  • Rewards/rejected: -0.0865
  • Rewards/accuracies: 0.6270
  • Rewards/margins: 0.0577
  • Logps/rejected: -252.4614
  • Logps/chosen: -280.4224
  • Logits/rejected: 1.0251
  • Logits/chosen: 0.9229

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.03 100 0.6931 -0.0003 -0.0006 0.4515 0.0003 -243.8745 -277.5758 1.0631 0.9710
0.693 0.05 200 0.6929 0.0028 -0.0017 0.5885 0.0046 -243.9904 -277.2661 1.0632 0.9705
0.6926 0.08 300 0.6925 0.0100 -0.0055 0.6260 0.0155 -244.3642 -276.5485 1.0488 0.9545
0.6916 0.1 400 0.6920 0.0057 -0.0240 0.6340 0.0297 -246.2157 -276.9778 0.9930 0.8978
0.6913 0.13 500 0.6917 -0.0320 -0.0687 0.6310 0.0366 -250.6851 -280.7516 0.9188 0.8239
0.6916 0.16 600 0.6915 -0.0605 -0.1045 0.6215 0.0440 -254.2614 -283.5969 0.9507 0.8586
0.6911 0.18 700 0.6914 -0.0360 -0.0798 0.6260 0.0438 -251.7944 -281.1486 0.9765 0.8818
0.6915 0.21 800 0.6913 -0.0433 -0.0906 0.6240 0.0473 -252.8779 -281.8777 0.9965 0.9022
0.691 0.24 900 0.6912 -0.0529 -0.1055 0.6245 0.0526 -254.3653 -282.8321 1.0206 0.9266
0.6913 0.26 1000 0.6912 -0.0397 -0.0905 0.6290 0.0507 -252.8640 -281.5216 1.0170 0.9216
0.6912 0.29 1100 0.6912 -0.0550 -0.1016 0.625 0.0466 -253.9782 -283.0510 1.0190 0.9244
0.6902 0.31 1200 0.6912 -0.0570 -0.1101 0.6230 0.0531 -254.8289 -283.2487 1.0101 0.9164
0.6912 0.34 1300 0.6911 -0.0234 -0.0732 0.6130 0.0498 -251.1342 -279.8864 1.0357 0.9401
0.6914 0.37 1400 0.6911 -0.0157 -0.0634 0.6295 0.0477 -250.1540 -279.1180 1.0311 0.9342
0.6919 0.39 1500 0.6910 -0.0502 -0.1023 0.6320 0.0521 -254.0441 -282.5649 1.0137 0.9161
0.6912 0.42 1600 0.6910 -0.0349 -0.0862 0.6320 0.0513 -252.4398 -281.0401 1.0315 0.9320
0.6905 0.44 1700 0.6910 -0.0530 -0.1089 0.6325 0.0559 -254.7030 -282.8433 1.0088 0.9100
0.6901 0.47 1800 0.6910 -0.0409 -0.0984 0.6225 0.0575 -253.6523 -281.6338 1.0314 0.9324
0.6902 0.5 1900 0.6910 -0.0326 -0.0895 0.6215 0.0569 -252.7657 -280.8078 1.0212 0.9226
0.6919 0.52 2000 0.6910 -0.0239 -0.0768 0.6275 0.0529 -251.4911 -279.9320 1.0252 0.9259
0.6919 0.55 2100 0.6909 -0.0381 -0.0926 0.6345 0.0545 -253.0794 -281.3606 1.0476 0.9477
0.6917 0.58 2200 0.6909 -0.0421 -0.0985 0.6325 0.0564 -253.6693 -281.7611 1.0407 0.9399
0.6909 0.6 2300 0.6909 -0.0318 -0.0861 0.6335 0.0543 -252.4272 -280.7285 1.0408 0.9399
0.6903 0.63 2400 0.6909 -0.0296 -0.0850 0.6360 0.0553 -252.3121 -280.5100 1.0219 0.9198
0.6908 0.65 2500 0.6909 -0.0373 -0.0959 0.6330 0.0586 -253.4011 -281.2754 1.0213 0.9196
0.6907 0.68 2600 0.6909 -0.0424 -0.1023 0.6295 0.0599 -254.0473 -281.7884 1.0173 0.9161
0.6905 0.71 2700 0.6909 -0.0353 -0.0938 0.6310 0.0585 -253.1964 -281.0736 1.0139 0.9119
0.692 0.73 2800 0.6909 -0.0327 -0.0894 0.6305 0.0567 -252.7526 -280.8156 1.0163 0.9141
0.6906 0.76 2900 0.6909 -0.0334 -0.0904 0.6295 0.0570 -252.8527 -280.8846 1.0123 0.9098
0.6904 0.79 3000 0.6909 -0.0312 -0.0890 0.6295 0.0579 -252.7167 -280.6625 1.0147 0.9123
0.6905 0.81 3100 0.6909 -0.0301 -0.0877 0.6330 0.0576 -252.5846 -280.5529 1.0175 0.9147
0.6919 0.84 3200 0.6909 -0.0301 -0.0878 0.6305 0.0577 -252.6000 -280.5576 1.0176 0.9154
0.69 0.86 3300 0.6909 -0.0266 -0.0839 0.6285 0.0573 -252.2050 -280.2096 1.0212 0.9186
0.689 0.89 3400 0.6909 -0.0289 -0.0867 0.6280 0.0578 -252.4849 -280.4384 1.0223 0.9202
0.6901 0.92 3500 0.6909 -0.0290 -0.0869 0.6260 0.0579 -252.5046 -280.4475 1.0239 0.9216
0.6914 0.94 3600 0.6909 -0.0288 -0.0865 0.6290 0.0577 -252.4631 -280.4258 1.0244 0.9221
0.6914 0.97 3700 0.6909 -0.0289 -0.0864 0.6320 0.0576 -252.4591 -280.4350 1.0240 0.9216
0.6917 0.99 3800 0.6909 -0.0287 -0.0866 0.6320 0.0579 -252.4790 -280.4204 1.0246 0.9221

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train DUAL-GPO/phi-2-gpo-renew2-b0.01-log-i0