Edit model card

phi_1.5_dpo_v3

This model is a fine-tuned version of microsoft/phi-1_5 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0000
  • Rewards/chosen: -0.6229
  • Rewards/rejected: -20.0554
  • Rewards/accuracies: 1.0
  • Rewards/margins: 19.4326
  • Logps/rejected: -599.3906
  • Logps/chosen: -97.1858
  • Logits/rejected: 4.0995
  • Logits/chosen: 5.4831

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 500

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.4948 0.27 10 0.1734 -0.0170 -1.7389 1.0 1.7219 -416.2255 -91.1275 4.9609 5.8425
0.0677 0.54 20 0.0006 -0.1878 -8.3012 1.0 8.1134 -481.8484 -92.8350 4.5966 5.7680
0.0002 0.81 30 0.0000 -0.3920 -14.2987 1.0 13.9067 -541.8231 -94.8774 4.3134 5.6330
0.0001 1.08 40 0.0000 -0.5124 -17.3966 1.0 16.8841 -572.8019 -96.0813 4.1996 5.5630
0.0 1.35 50 0.0000 -0.5702 -18.7616 1.0 18.1915 -586.4525 -96.6585 4.1536 5.5312
0.0 1.62 60 0.0000 -0.5923 -19.3013 1.0 18.7089 -591.8488 -96.8803 4.1366 5.5189
0.0 1.89 70 0.0000 -0.6007 -19.5016 1.0 18.9008 -593.8517 -96.9644 4.1304 5.5141
0.0 2.16 80 0.0000 -0.6059 -19.6000 1.0 18.9940 -594.8359 -97.0163 4.1267 5.5114
0.0 2.43 90 0.0000 -0.6070 -19.6481 1.0 19.0411 -595.3173 -97.0273 4.1248 5.5089
0.0 2.7 100 0.0000 -0.6084 -19.6635 1.0 19.0551 -595.4709 -97.0409 4.1232 5.5085
0.0 2.97 110 0.0000 -0.6088 -19.6749 1.0 19.0661 -595.5847 -97.0451 4.1232 5.5081
0.0 3.24 120 0.0000 -0.6088 -19.6989 1.0 19.0900 -595.8249 -97.0454 4.1225 5.5062
0.0 3.51 130 0.0000 -0.6088 -19.7185 1.0 19.1097 -596.0208 -97.0448 4.1200 5.5053
0.0 3.78 140 0.0000 -0.6091 -19.7351 1.0 19.1260 -596.1869 -97.0481 4.1203 5.5043
0.0 4.05 150 0.0000 -0.6097 -19.7339 1.0 19.1241 -596.1747 -97.0545 4.1200 5.5044
0.0 4.32 160 0.0000 -0.6101 -19.7392 1.0 19.1291 -596.2282 -97.0581 4.1191 5.5041
0.0 4.59 170 0.0000 -0.6095 -19.7407 1.0 19.1312 -596.2433 -97.0524 4.1196 5.5041
0.0 4.86 180 0.0000 -0.6127 -19.7737 1.0 19.1610 -596.5731 -97.0837 4.1176 5.5019
0.0 5.14 190 0.0000 -0.6138 -19.7921 1.0 19.1784 -596.7576 -97.0946 4.1164 5.5004
0.0 5.41 200 0.0000 -0.6132 -19.7929 1.0 19.1796 -596.7647 -97.0892 4.1152 5.5001
0.0 5.68 210 0.0000 -0.6115 -19.7954 1.0 19.1839 -596.7902 -97.0723 4.1154 5.4998
0.0 5.95 220 0.0000 -0.6129 -19.8083 1.0 19.1954 -596.9189 -97.0859 4.1143 5.4990
0.0 6.22 230 0.0000 -0.6153 -19.8312 1.0 19.2159 -597.1479 -97.1100 4.1132 5.4973
0.0 6.49 240 0.0000 -0.6142 -19.8468 1.0 19.2325 -597.3038 -97.0994 4.1127 5.4970
0.0 6.76 250 0.0000 -0.6141 -19.8735 1.0 19.2594 -597.5714 -97.0983 4.1111 5.4953
0.0 7.03 260 0.0000 -0.6148 -19.8878 1.0 19.2730 -597.7144 -97.1054 4.1100 5.4941
0.0 7.3 270 0.0000 -0.6164 -19.8937 1.0 19.2773 -597.7730 -97.1213 4.1091 5.4937
0.0 7.57 280 0.0000 -0.6176 -19.9184 1.0 19.3009 -598.0203 -97.1326 4.1079 5.4924
0.0 7.84 290 0.0000 -0.6191 -19.9314 1.0 19.3124 -598.1504 -97.1476 4.1073 5.4910
0.0 8.11 300 0.0000 -0.6155 -19.9405 1.0 19.3250 -598.2412 -97.1125 4.1067 5.4906
0.0 8.38 310 0.0000 -0.6184 -19.9647 1.0 19.3463 -598.4835 -97.1412 4.1057 5.4891
0.0 8.65 320 0.0000 -0.6201 -19.9751 1.0 19.3550 -598.5868 -97.1580 4.1047 5.4883
0.0 8.92 330 0.0000 -0.6189 -19.9759 1.0 19.3570 -598.5950 -97.1458 4.1044 5.4881
0.0 9.19 340 0.0000 -0.6209 -19.9780 1.0 19.3572 -598.6162 -97.1656 4.1039 5.4880
0.0 9.46 350 0.0000 -0.6196 -19.9837 1.0 19.3641 -598.6727 -97.1528 4.1043 5.4878
0.0 9.73 360 0.0000 -0.6194 -19.9866 1.0 19.3672 -598.7023 -97.1515 4.1041 5.4878
0.0 10.0 370 0.0000 -0.6199 -19.9960 1.0 19.3761 -598.7965 -97.1560 4.1030 5.4867
0.0 10.27 380 0.0000 -0.6209 -20.0033 1.0 19.3824 -598.8690 -97.1657 4.1025 5.4862
0.0 10.54 390 0.0000 -0.6205 -20.0132 1.0 19.3927 -598.9681 -97.1625 4.1021 5.4857
0.0 10.81 400 0.0000 -0.6226 -20.0238 1.0 19.4012 -599.0746 -97.1832 4.1013 5.4849
0.0 11.08 410 0.0000 -0.6207 -20.0343 1.0 19.4136 -599.1791 -97.1641 4.1014 5.4846
0.0 11.35 420 0.0000 -0.6215 -20.0337 1.0 19.4122 -599.1733 -97.1719 4.1010 5.4847
0.0 11.62 430 0.0000 -0.6212 -20.0356 1.0 19.4144 -599.1924 -97.1693 4.1008 5.4845
0.0 11.89 440 0.0000 -0.6216 -20.0326 1.0 19.4111 -599.1625 -97.1727 4.1007 5.4847
0.0 12.16 450 0.0000 -0.6219 -20.0401 1.0 19.4182 -599.2375 -97.1761 4.0998 5.4838
0.0 12.43 460 0.0000 -0.6225 -20.0430 1.0 19.4205 -599.2663 -97.1819 4.1004 5.4836
0.0 12.7 470 0.0000 -0.6230 -20.0486 1.0 19.4255 -599.3220 -97.1875 4.1003 5.4836
0.0 12.97 480 0.0000 -0.6225 -20.0484 1.0 19.4259 -599.3201 -97.1819 4.1002 5.4834
0.0 13.24 490 0.0000 -0.6209 -20.0524 1.0 19.4315 -599.3601 -97.1659 4.1000 5.4831
0.0 13.51 500 0.0000 -0.6229 -20.0554 1.0 19.4326 -599.3906 -97.1858 4.0995 5.4831

Framework versions

  • Transformers 4.33.0
  • Pytorch 2.0.1+cu117
  • Datasets 2.1.0
  • Tokenizers 0.13.3
Downloads last month

-

Downloads are not tracked for this model. How to track
Unable to determine this model's library. Check the docs .

Finetuned from