Edit model card

PE_Mistral_7b_sft_rlhf

This model was trained from scratch on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1434
  • Rewards/chosen: 4.8188
  • Rewards/rejected: -1.0484
  • Rewards/accuracies: 0.9162
  • Rewards/margins: 5.8672
  • Logps/rejected: -267.3837
  • Logps/chosen: -402.2661
  • Logits/rejected: -4.8346
  • Logits/chosen: -4.9027

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-07
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5056 0.05 100 0.4880 0.9372 0.4139 0.7709 0.5233 -264.4592 -410.0293 -4.0623 -4.0935
0.3169 0.09 200 0.3182 1.8650 0.4687 0.8715 1.3963 -264.3495 -408.1737 -4.0439 -4.0734
0.283 0.14 300 0.2592 2.4814 0.4016 0.8939 2.0798 -264.4838 -406.9410 -4.0207 -4.0476
0.2269 0.18 400 0.2334 3.0842 0.4483 0.8883 2.6359 -264.3903 -405.7354 -4.0647 -4.0901
0.1909 0.23 500 0.2152 3.4097 0.3555 0.8827 3.0542 -264.5758 -405.0843 -4.0629 -4.0917
0.2244 0.27 600 0.2027 3.9353 0.5427 0.8994 3.3927 -264.2016 -404.0331 -4.0748 -4.1061
0.2118 0.32 700 0.1950 3.9411 0.4638 0.9050 3.4773 -264.3593 -404.0216 -4.1398 -4.1738
0.1811 0.37 800 0.1924 4.5304 0.6884 0.8994 3.8420 -263.9102 -402.8429 -4.1152 -4.1478
0.1802 0.41 900 0.1870 4.2323 0.2982 0.9022 3.9342 -264.6906 -403.4391 -4.2456 -4.2902
0.1738 0.46 1000 0.1961 4.0671 0.1452 0.8939 3.9219 -264.9966 -403.7696 -4.2846 -4.3360
0.1771 0.5 1100 0.1879 5.0043 0.9027 0.8966 4.1016 -263.4816 -401.8953 -4.3106 -4.3575
0.1758 0.55 1200 0.1776 4.7044 0.4253 0.8994 4.2790 -264.4362 -402.4950 -4.2840 -4.3304
0.175 0.59 1300 0.1727 4.6859 0.3544 0.9106 4.3315 -264.5781 -402.5319 -4.3214 -4.3726
0.164 0.64 1400 0.1724 4.9443 0.4681 0.9078 4.4762 -264.3508 -402.0152 -4.3194 -4.3715
0.1452 0.68 1500 0.1733 4.7850 0.2245 0.8994 4.5605 -264.8378 -402.3337 -4.3552 -4.4152
0.1607 0.73 1600 0.1838 4.8277 0.2922 0.9134 4.5355 -264.7025 -402.2484 -4.3342 -4.3831
0.1611 0.78 1700 0.1720 4.6017 0.0825 0.9078 4.5192 -265.1219 -402.7004 -4.4271 -4.4835
0.1895 0.82 1800 0.1724 4.8294 0.2577 0.9162 4.5717 -264.7715 -402.2449 -4.3923 -4.4439
0.1553 0.87 1900 0.1676 4.9769 0.2525 0.9106 4.7245 -264.7820 -401.9499 -4.4402 -4.4909
0.1555 0.91 2000 0.1651 4.6602 -0.0049 0.9134 4.6651 -265.2967 -402.5833 -4.4891 -4.5421
0.1583 0.96 2100 0.1644 4.9572 0.0951 0.9134 4.8621 -265.0968 -401.9894 -4.4897 -4.5469
0.1414 1.0 2200 0.1647 4.7501 -0.1147 0.9022 4.8648 -265.5163 -402.4036 -4.5646 -4.6227
0.1572 1.05 2300 0.1613 4.9643 -0.0239 0.9134 4.9882 -265.3347 -401.9751 -4.4874 -4.5459
0.1271 1.1 2400 0.1592 4.9234 -0.0822 0.9050 5.0056 -265.4514 -402.0570 -4.5334 -4.5967
0.128 1.14 2500 0.1585 5.2048 0.0677 0.9162 5.1371 -265.1516 -401.4941 -4.5336 -4.5930
0.1276 1.19 2600 0.1598 5.0338 -0.1020 0.9330 5.1358 -265.4910 -401.8362 -4.5631 -4.6266
0.1377 1.23 2700 0.1618 5.1033 -0.0106 0.9190 5.1139 -265.3082 -401.6972 -4.6462 -4.7083
0.1489 1.28 2800 0.1576 5.0197 -0.0696 0.9274 5.0893 -265.4261 -401.8644 -4.6567 -4.7149
0.1252 1.32 2900 0.1594 4.8216 -0.3970 0.9218 5.2186 -266.0809 -402.2606 -4.6885 -4.7496
0.1177 1.37 3000 0.1561 5.1379 -0.1943 0.9190 5.3322 -265.6755 -401.6280 -4.6552 -4.7179
0.1338 1.42 3100 0.1596 4.8017 -0.4888 0.9218 5.2905 -266.2645 -402.3004 -4.6469 -4.7124
0.1393 1.46 3200 0.1558 5.0657 -0.1950 0.9274 5.2607 -265.6770 -401.7724 -4.6387 -4.7046
0.1268 1.51 3300 0.1560 4.6565 -0.5086 0.9134 5.1651 -266.3041 -402.5907 -4.7132 -4.7861
0.14 1.55 3400 0.1538 4.9324 -0.3633 0.9162 5.2957 -266.0134 -402.0388 -4.7915 -4.8605
0.144 1.6 3500 0.1544 5.1375 -0.1980 0.9246 5.3356 -265.6830 -401.6287 -4.7136 -4.7829
0.1293 1.64 3600 0.1535 4.8933 -0.5563 0.9218 5.4496 -266.3995 -402.1171 -4.6752 -4.7438
0.1503 1.69 3700 0.1545 5.0576 -0.3291 0.9134 5.3868 -265.9452 -401.7885 -4.7176 -4.7882
0.1313 1.73 3800 0.1493 5.0374 -0.3896 0.9134 5.4271 -266.0661 -401.8289 -4.7076 -4.7735
0.1312 1.78 3900 0.1480 5.0451 -0.3528 0.9162 5.3979 -265.9925 -401.8134 -4.7360 -4.8095
0.1227 1.83 4000 0.1472 4.8811 -0.6323 0.9162 5.5134 -266.5515 -402.1416 -4.7648 -4.8335
0.1364 1.87 4100 0.1464 4.8835 -0.6254 0.9190 5.5089 -266.5378 -402.1368 -4.7688 -4.8339
0.1472 1.92 4200 0.1461 5.0051 -0.5056 0.9190 5.5107 -266.2981 -401.8935 -4.7772 -4.8406
0.1187 1.96 4300 0.1460 5.0734 -0.4883 0.9246 5.5618 -266.2636 -401.7569 -4.8278 -4.8964
0.1212 2.01 4400 0.1476 4.6410 -0.9740 0.9218 5.6150 -267.2350 -402.6217 -4.7401 -4.8056
0.0998 2.05 4500 0.1453 4.8904 -0.7622 0.9190 5.6526 -266.8114 -402.1230 -4.7775 -4.8457
0.1119 2.1 4600 0.1471 4.8572 -0.8493 0.9246 5.7066 -266.9856 -402.1892 -4.7661 -4.8375
0.1175 2.15 4700 0.1480 4.8949 -0.8295 0.9134 5.7244 -266.9460 -402.1140 -4.8256 -4.8951
0.1046 2.19 4800 0.1457 4.9889 -0.7191 0.9162 5.7081 -266.7252 -401.9258 -4.7844 -4.8509
0.1267 2.24 4900 0.1491 4.3841 -1.2180 0.9190 5.6021 -267.7230 -403.1356 -4.8244 -4.8912
0.1188 2.28 5000 0.1445 4.8538 -0.7816 0.9190 5.6354 -266.8502 -402.1961 -4.8018 -4.8691
0.1105 2.33 5100 0.1450 4.6539 -0.9854 0.9218 5.6393 -267.2578 -402.5959 -4.8686 -4.9350
0.1213 2.37 5200 0.1475 4.5392 -1.0654 0.9162 5.6046 -267.4177 -402.8253 -4.8665 -4.9331
0.1193 2.42 5300 0.1475 4.8873 -0.8375 0.9246 5.7248 -266.9619 -402.1292 -4.8357 -4.9002
0.1084 2.46 5400 0.1477 4.7995 -0.9213 0.9162 5.7209 -267.1296 -402.3047 -4.8708 -4.9381
0.103 2.51 5500 0.1450 4.9781 -0.8373 0.9246 5.8153 -266.9615 -401.9476 -4.8037 -4.8687
0.1032 2.56 5600 0.1449 4.9292 -0.9533 0.9218 5.8825 -267.1936 -402.0454 -4.7928 -4.8573
0.1076 2.6 5700 0.1446 5.1772 -0.7431 0.9134 5.9203 -266.7732 -401.5494 -4.7715 -4.8365
0.1048 2.65 5800 0.1438 4.8244 -1.0354 0.9218 5.8598 -267.3577 -402.2549 -4.8149 -4.8820
0.0975 2.69 5900 0.1446 4.6830 -1.1482 0.9134 5.8312 -267.5833 -402.5378 -4.8247 -4.8923
0.1251 2.74 6000 0.1433 4.8302 -1.0405 0.9134 5.8707 -267.3680 -402.2434 -4.8053 -4.8717
0.1279 2.78 6100 0.1433 4.9076 -0.9568 0.9134 5.8645 -267.2006 -402.0885 -4.8238 -4.8905
0.1334 2.83 6200 0.1434 4.9038 -0.9683 0.9106 5.8720 -267.2234 -402.0963 -4.8401 -4.9070
0.111 2.88 6300 0.1432 4.8810 -0.9900 0.9162 5.8709 -267.2668 -402.1418 -4.8330 -4.9001
0.1204 2.92 6400 0.1433 4.9268 -0.9522 0.9134 5.8790 -267.1913 -402.0501 -4.8292 -4.8967
0.12 2.97 6500 0.1431 4.8308 -1.0363 0.9162 5.8671 -267.3595 -402.2421 -4.8341 -4.9021

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
28
Safetensors
Model size
7.24B params
Tensor type
BF16
·