Edit model card

dolly-v2-7b-dpo-full-3-epoch-hydrox-safe

This model is a fine-tuned version of databricks/dolly-v2-7b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0060
  • Rewards/chosen: 3.6820
  • Rewards/rejected: -10.6709
  • Rewards/accuracies: 0.9966
  • Rewards/margins: 14.3529
  • Logps/rejected: -666.2253
  • Logps/chosen: -383.1022
  • Logits/rejected: -1.3595
  • Logits/chosen: -1.5884

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.873 0.03 100 0.7547 0.2247 -0.0238 0.5918 0.2485 -559.7548 -417.6762 -1.1954 -1.4929
0.6069 0.07 200 0.5675 0.7273 -0.0433 0.7407 0.7706 -559.9498 -412.6499 -1.1932 -1.4956
0.3668 0.1 300 0.3913 1.3855 -0.2301 0.8552 1.6156 -561.8173 -406.0676 -1.1768 -1.4862
0.2547 0.14 400 0.2942 2.0422 -0.3359 0.8897 2.3781 -562.875 -399.5007 -1.1603 -1.4768
0.2496 0.17 500 0.2323 2.5759 -0.5597 0.9184 3.1357 -565.1138 -394.1635 -1.1394 -1.4661
0.2099 0.2 600 0.1979 3.0353 -0.7414 0.9242 3.7767 -566.9301 -389.5694 -1.1137 -1.4513
0.123 0.24 700 0.1624 3.4398 -1.1264 0.9436 4.5662 -570.7800 -385.5248 -1.1147 -1.4531
0.1211 0.27 800 0.1404 3.7877 -1.3826 0.9453 5.1703 -573.3425 -382.0456 -1.1126 -1.4555
0.1398 0.31 900 0.1305 4.1188 -1.5720 0.9545 5.6908 -575.2359 -378.7344 -1.1145 -1.4568
0.1161 0.34 1000 0.1066 4.3055 -1.7418 0.9646 6.0473 -576.9345 -376.8678 -1.1217 -1.4605
0.1109 0.37 1100 0.1006 4.4233 -2.0049 0.9621 6.4282 -579.5653 -375.6897 -1.1334 -1.4683
0.0983 0.41 1200 0.0881 4.3080 -2.5628 0.9638 6.8708 -585.1442 -376.8426 -1.1544 -1.4813
0.0965 0.44 1300 0.0778 4.3457 -2.6685 0.9621 7.0142 -586.2010 -376.4656 -1.1651 -1.4955
0.0542 0.48 1400 0.0705 4.3768 -3.1529 0.9739 7.5297 -591.0455 -376.1544 -1.1767 -1.4972
0.053 0.51 1500 0.0659 4.4009 -3.2268 0.9781 7.6278 -591.7845 -375.9133 -1.1797 -1.5057
0.0653 0.54 1600 0.0680 4.3566 -3.1994 0.9781 7.5559 -591.5099 -376.3570 -1.1682 -1.4980
0.0634 0.58 1700 0.0553 4.2444 -3.7967 0.9764 8.0411 -597.4832 -377.4786 -1.1988 -1.5176
0.0574 0.61 1800 0.0490 4.3076 -3.9340 0.9790 8.2416 -598.8566 -376.8465 -1.2222 -1.5299
0.0518 0.65 1900 0.0424 4.2820 -4.0325 0.9857 8.3145 -599.8412 -377.1030 -1.2285 -1.5390
0.0376 0.68 2000 0.0423 4.2925 -4.1097 0.9840 8.4022 -600.6129 -376.9975 -1.2280 -1.5311
0.0339 0.71 2100 0.0424 4.2423 -4.3969 0.9882 8.6393 -603.4858 -377.4996 -1.2371 -1.5422
0.0323 0.75 2200 0.0418 4.3016 -4.3550 0.9832 8.6566 -603.0663 -376.9068 -1.2198 -1.5286
0.0267 0.78 2300 0.0386 4.1635 -4.6663 0.9882 8.8297 -606.1791 -378.2882 -1.2158 -1.5230
0.0296 0.82 2400 0.0316 3.9990 -5.4019 0.9907 9.4009 -613.5353 -379.9330 -1.2347 -1.5268
0.0289 0.85 2500 0.0315 4.1064 -5.2099 0.9907 9.3164 -611.6158 -378.8586 -1.2152 -1.5109
0.0326 0.88 2600 0.0280 4.0899 -5.4030 0.9907 9.4929 -613.5463 -379.0233 -1.2434 -1.5354
0.025 0.92 2700 0.0333 4.0463 -5.2395 0.9924 9.2857 -611.9110 -379.4600 -1.2283 -1.5268
0.0273 0.95 2800 0.0259 4.1046 -5.4947 0.9975 9.5993 -614.4639 -378.8770 -1.2253 -1.5271
0.0197 0.99 2900 0.0360 4.1642 -5.3436 0.9907 9.5078 -612.9525 -378.2808 -1.2320 -1.5321
0.0196 1.02 3000 0.0267 3.8748 -5.9868 0.9949 9.8616 -619.3846 -381.1749 -1.2358 -1.5308
0.0188 1.05 3100 0.0268 3.8452 -6.0908 0.9949 9.9361 -620.4247 -381.4705 -1.2365 -1.5361
0.0172 1.09 3200 0.0231 3.7735 -6.3630 0.9907 10.1365 -623.1463 -382.1877 -1.2627 -1.5561
0.0099 1.12 3300 0.0218 3.7491 -6.5816 0.9958 10.3307 -625.3326 -382.4322 -1.2410 -1.5316
0.0113 1.16 3400 0.0189 3.7109 -6.6907 0.9958 10.4017 -626.4235 -382.8133 -1.2519 -1.5387
0.0146 1.19 3500 0.0191 3.6138 -7.1128 0.9941 10.7266 -630.6445 -383.7852 -1.2702 -1.5462
0.0108 1.22 3600 0.0175 3.5940 -7.3181 0.9949 10.9121 -632.6978 -383.9829 -1.2642 -1.5481
0.0175 1.26 3700 0.0183 3.4786 -7.8254 0.9949 11.3039 -637.7700 -385.1370 -1.2904 -1.5503
0.0147 1.29 3800 0.0153 3.2734 -8.1715 0.9966 11.4449 -641.2316 -387.1888 -1.3082 -1.5667
0.0113 1.33 3900 0.0153 3.3033 -8.3504 0.9966 11.6537 -643.0201 -386.8899 -1.2907 -1.5525
0.0284 1.36 4000 0.0270 3.5241 -8.1571 0.9924 11.6812 -641.0871 -384.6817 -1.2917 -1.5474
0.0101 1.39 4100 0.0138 3.3142 -8.9443 0.9941 12.2585 -648.9590 -386.7809 -1.3039 -1.5402
0.0093 1.43 4200 0.0159 3.3533 -9.0499 0.9966 12.4032 -650.0153 -386.3899 -1.3067 -1.5543
0.0083 1.46 4300 0.0149 3.4209 -8.8296 0.9958 12.2505 -647.8128 -385.7142 -1.3104 -1.5558
0.0068 1.5 4400 0.0123 3.2700 -9.3033 0.9975 12.5733 -652.5496 -387.2229 -1.3257 -1.5680
0.0093 1.53 4500 0.0122 3.5894 -8.8354 0.9983 12.4248 -647.8701 -384.0288 -1.3217 -1.5701
0.0065 1.56 4600 0.0117 3.4515 -8.7814 0.9975 12.2329 -647.3306 -385.4080 -1.3381 -1.5838
0.0132 1.6 4700 0.0119 3.4540 -8.4518 0.9975 11.9058 -644.0345 -385.3825 -1.3352 -1.5862
0.0085 1.63 4800 0.0113 3.3970 -8.7353 0.9966 12.1323 -646.8692 -385.9526 -1.3331 -1.5766
0.0096 1.67 4900 0.0121 3.2728 -9.0713 0.9966 12.3442 -650.2295 -387.1943 -1.3552 -1.5969
0.0042 1.7 5000 0.0106 3.1699 -9.4193 0.9975 12.5892 -653.7093 -388.2237 -1.3307 -1.5739
0.0116 1.73 5100 0.0096 3.2716 -9.0292 0.9958 12.3008 -649.8085 -387.2067 -1.3274 -1.5748
0.0093 1.77 5200 0.0103 3.2228 -9.3477 0.9983 12.5706 -652.9938 -387.6946 -1.3153 -1.5495
0.0058 1.8 5300 0.0103 3.1251 -9.6052 0.9966 12.7303 -655.5681 -388.6714 -1.3273 -1.5594
0.0066 1.84 5400 0.0094 3.5167 -9.0559 0.9983 12.5726 -650.0754 -384.7553 -1.3330 -1.5721
0.0038 1.87 5500 0.0093 3.5884 -9.0262 0.9983 12.6146 -649.7783 -384.0386 -1.3171 -1.5599
0.0134 1.9 5600 0.0093 3.0874 -9.8027 0.9983 12.8901 -657.5432 -389.0488 -1.3368 -1.5645
0.0059 1.94 5700 0.0098 3.4393 -9.7104 0.9975 13.1497 -656.6204 -385.5294 -1.3526 -1.5716
0.0057 1.97 5800 0.0080 3.5892 -9.4003 0.9983 12.9896 -653.5198 -384.0307 -1.3593 -1.5880
0.0015 2.01 5900 0.0102 3.4266 -9.8551 0.9966 13.2816 -658.0669 -385.6569 -1.3552 -1.5837
0.0019 2.04 6000 0.0105 3.5092 -9.9457 0.9983 13.4549 -658.9734 -384.8311 -1.3418 -1.5734
0.0049 2.07 6100 0.0083 3.4872 -10.1039 0.9983 13.5911 -660.5549 -385.0504 -1.3269 -1.5633
0.0056 2.11 6200 0.0089 3.3922 -10.3713 0.9975 13.7635 -663.2297 -386.0008 -1.3437 -1.5700
0.0041 2.14 6300 0.0078 3.5705 -10.1344 0.9983 13.7049 -660.8607 -384.2182 -1.3527 -1.5831
0.0039 2.18 6400 0.0092 3.3798 -10.7994 0.9975 14.1792 -667.5103 -386.1252 -1.3748 -1.5843
0.0018 2.21 6500 0.0076 3.5825 -10.5328 0.9983 14.1153 -664.8441 -384.0977 -1.3583 -1.5744
0.0037 2.24 6600 0.0075 3.5553 -10.3432 0.9983 13.8984 -662.9481 -384.3702 -1.3604 -1.5848
0.0021 2.28 6700 0.0082 3.7310 -10.3324 0.9992 14.0634 -662.8404 -382.6127 -1.3437 -1.5693
0.0025 2.31 6800 0.0074 3.5582 -10.6710 0.9975 14.2292 -666.2263 -384.3409 -1.3487 -1.5658
0.0112 2.35 6900 0.0076 3.5915 -10.7786 0.9966 14.3700 -667.3019 -384.0081 -1.3470 -1.5688
0.0022 2.38 7000 0.0080 3.6060 -10.6007 0.9975 14.2067 -665.5234 -383.8625 -1.3536 -1.5774
0.0012 2.41 7100 0.0063 3.5627 -10.8773 0.9975 14.4400 -668.2891 -384.2953 -1.3445 -1.5681
0.0018 2.45 7200 0.0070 3.4237 -11.0692 0.9975 14.4928 -670.2083 -385.6862 -1.3656 -1.5819
0.0084 2.48 7300 0.0079 3.7091 -10.3477 0.9983 14.0569 -662.9936 -382.8314 -1.3539 -1.5873
0.0031 2.52 7400 0.0064 3.5680 -10.4848 0.9983 14.0528 -664.3639 -384.2423 -1.3510 -1.5829
0.0027 2.55 7500 0.0069 3.5130 -10.6612 0.9983 14.1741 -666.1280 -384.7932 -1.3666 -1.5947
0.0051 2.58 7600 0.0066 3.5461 -10.7595 0.9983 14.3056 -667.1109 -384.4612 -1.3600 -1.5872
0.001 2.62 7700 0.0076 3.5633 -10.7690 0.9983 14.3323 -667.2067 -384.2903 -1.3486 -1.5750
0.0021 2.65 7800 0.0066 3.6662 -10.7670 0.9983 14.4332 -667.1862 -383.2607 -1.3604 -1.5892
0.004 2.69 7900 0.0067 3.7915 -10.4856 0.9983 14.2771 -664.3723 -382.0074 -1.3540 -1.5830
0.0022 2.72 8000 0.0066 3.8259 -10.5371 0.9983 14.3630 -664.8873 -381.6641 -1.3510 -1.5812
0.0018 2.75 8100 0.0071 3.7228 -10.6783 0.9983 14.4011 -666.2990 -382.6946 -1.3470 -1.5789
0.0015 2.79 8200 0.0065 3.7032 -10.7685 0.9983 14.4717 -667.2010 -382.8909 -1.3501 -1.5791
0.0015 2.82 8300 0.0072 3.7173 -10.7747 0.9975 14.4920 -667.2634 -382.7499 -1.3574 -1.5864
0.0016 2.86 8400 0.0064 3.7268 -10.7100 0.9983 14.4368 -666.6169 -382.6550 -1.3626 -1.5899
0.0022 2.89 8500 0.0062 3.6175 -10.9238 0.9992 14.5413 -668.7542 -383.7477 -1.3531 -1.5792
0.0032 2.92 8600 0.0059 3.7174 -10.7382 0.9983 14.4556 -666.8983 -382.7484 -1.3576 -1.5869
0.0035 2.96 8700 0.0062 3.5749 -11.0859 0.9975 14.6608 -670.3754 -384.1739 -1.3459 -1.5728
0.0024 2.99 8800 0.0062 3.6894 -10.8042 0.9983 14.4936 -667.5587 -383.0290 -1.3618 -1.5891

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
1
Safetensors
Model size
6.86B params
Tensor type
BF16
·

Finetuned from