PE_Mistral_7b_sft_rlhf
This model was trained from scratch on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 0.1434
- Rewards/chosen: 4.8188
- Rewards/rejected: -1.0484
- Rewards/accuracies: 0.9162
- Rewards/margins: 5.8672
- Logps/rejected: -267.3837
- Logps/chosen: -402.2661
- Logits/rejected: -4.8346
- Logits/chosen: -4.9027
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-07
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.5056 | 0.05 | 100 | 0.4880 | 0.9372 | 0.4139 | 0.7709 | 0.5233 | -264.4592 | -410.0293 | -4.0623 | -4.0935 |
0.3169 | 0.09 | 200 | 0.3182 | 1.8650 | 0.4687 | 0.8715 | 1.3963 | -264.3495 | -408.1737 | -4.0439 | -4.0734 |
0.283 | 0.14 | 300 | 0.2592 | 2.4814 | 0.4016 | 0.8939 | 2.0798 | -264.4838 | -406.9410 | -4.0207 | -4.0476 |
0.2269 | 0.18 | 400 | 0.2334 | 3.0842 | 0.4483 | 0.8883 | 2.6359 | -264.3903 | -405.7354 | -4.0647 | -4.0901 |
0.1909 | 0.23 | 500 | 0.2152 | 3.4097 | 0.3555 | 0.8827 | 3.0542 | -264.5758 | -405.0843 | -4.0629 | -4.0917 |
0.2244 | 0.27 | 600 | 0.2027 | 3.9353 | 0.5427 | 0.8994 | 3.3927 | -264.2016 | -404.0331 | -4.0748 | -4.1061 |
0.2118 | 0.32 | 700 | 0.1950 | 3.9411 | 0.4638 | 0.9050 | 3.4773 | -264.3593 | -404.0216 | -4.1398 | -4.1738 |
0.1811 | 0.37 | 800 | 0.1924 | 4.5304 | 0.6884 | 0.8994 | 3.8420 | -263.9102 | -402.8429 | -4.1152 | -4.1478 |
0.1802 | 0.41 | 900 | 0.1870 | 4.2323 | 0.2982 | 0.9022 | 3.9342 | -264.6906 | -403.4391 | -4.2456 | -4.2902 |
0.1738 | 0.46 | 1000 | 0.1961 | 4.0671 | 0.1452 | 0.8939 | 3.9219 | -264.9966 | -403.7696 | -4.2846 | -4.3360 |
0.1771 | 0.5 | 1100 | 0.1879 | 5.0043 | 0.9027 | 0.8966 | 4.1016 | -263.4816 | -401.8953 | -4.3106 | -4.3575 |
0.1758 | 0.55 | 1200 | 0.1776 | 4.7044 | 0.4253 | 0.8994 | 4.2790 | -264.4362 | -402.4950 | -4.2840 | -4.3304 |
0.175 | 0.59 | 1300 | 0.1727 | 4.6859 | 0.3544 | 0.9106 | 4.3315 | -264.5781 | -402.5319 | -4.3214 | -4.3726 |
0.164 | 0.64 | 1400 | 0.1724 | 4.9443 | 0.4681 | 0.9078 | 4.4762 | -264.3508 | -402.0152 | -4.3194 | -4.3715 |
0.1452 | 0.68 | 1500 | 0.1733 | 4.7850 | 0.2245 | 0.8994 | 4.5605 | -264.8378 | -402.3337 | -4.3552 | -4.4152 |
0.1607 | 0.73 | 1600 | 0.1838 | 4.8277 | 0.2922 | 0.9134 | 4.5355 | -264.7025 | -402.2484 | -4.3342 | -4.3831 |
0.1611 | 0.78 | 1700 | 0.1720 | 4.6017 | 0.0825 | 0.9078 | 4.5192 | -265.1219 | -402.7004 | -4.4271 | -4.4835 |
0.1895 | 0.82 | 1800 | 0.1724 | 4.8294 | 0.2577 | 0.9162 | 4.5717 | -264.7715 | -402.2449 | -4.3923 | -4.4439 |
0.1553 | 0.87 | 1900 | 0.1676 | 4.9769 | 0.2525 | 0.9106 | 4.7245 | -264.7820 | -401.9499 | -4.4402 | -4.4909 |
0.1555 | 0.91 | 2000 | 0.1651 | 4.6602 | -0.0049 | 0.9134 | 4.6651 | -265.2967 | -402.5833 | -4.4891 | -4.5421 |
0.1583 | 0.96 | 2100 | 0.1644 | 4.9572 | 0.0951 | 0.9134 | 4.8621 | -265.0968 | -401.9894 | -4.4897 | -4.5469 |
0.1414 | 1.0 | 2200 | 0.1647 | 4.7501 | -0.1147 | 0.9022 | 4.8648 | -265.5163 | -402.4036 | -4.5646 | -4.6227 |
0.1572 | 1.05 | 2300 | 0.1613 | 4.9643 | -0.0239 | 0.9134 | 4.9882 | -265.3347 | -401.9751 | -4.4874 | -4.5459 |
0.1271 | 1.1 | 2400 | 0.1592 | 4.9234 | -0.0822 | 0.9050 | 5.0056 | -265.4514 | -402.0570 | -4.5334 | -4.5967 |
0.128 | 1.14 | 2500 | 0.1585 | 5.2048 | 0.0677 | 0.9162 | 5.1371 | -265.1516 | -401.4941 | -4.5336 | -4.5930 |
0.1276 | 1.19 | 2600 | 0.1598 | 5.0338 | -0.1020 | 0.9330 | 5.1358 | -265.4910 | -401.8362 | -4.5631 | -4.6266 |
0.1377 | 1.23 | 2700 | 0.1618 | 5.1033 | -0.0106 | 0.9190 | 5.1139 | -265.3082 | -401.6972 | -4.6462 | -4.7083 |
0.1489 | 1.28 | 2800 | 0.1576 | 5.0197 | -0.0696 | 0.9274 | 5.0893 | -265.4261 | -401.8644 | -4.6567 | -4.7149 |
0.1252 | 1.32 | 2900 | 0.1594 | 4.8216 | -0.3970 | 0.9218 | 5.2186 | -266.0809 | -402.2606 | -4.6885 | -4.7496 |
0.1177 | 1.37 | 3000 | 0.1561 | 5.1379 | -0.1943 | 0.9190 | 5.3322 | -265.6755 | -401.6280 | -4.6552 | -4.7179 |
0.1338 | 1.42 | 3100 | 0.1596 | 4.8017 | -0.4888 | 0.9218 | 5.2905 | -266.2645 | -402.3004 | -4.6469 | -4.7124 |
0.1393 | 1.46 | 3200 | 0.1558 | 5.0657 | -0.1950 | 0.9274 | 5.2607 | -265.6770 | -401.7724 | -4.6387 | -4.7046 |
0.1268 | 1.51 | 3300 | 0.1560 | 4.6565 | -0.5086 | 0.9134 | 5.1651 | -266.3041 | -402.5907 | -4.7132 | -4.7861 |
0.14 | 1.55 | 3400 | 0.1538 | 4.9324 | -0.3633 | 0.9162 | 5.2957 | -266.0134 | -402.0388 | -4.7915 | -4.8605 |
0.144 | 1.6 | 3500 | 0.1544 | 5.1375 | -0.1980 | 0.9246 | 5.3356 | -265.6830 | -401.6287 | -4.7136 | -4.7829 |
0.1293 | 1.64 | 3600 | 0.1535 | 4.8933 | -0.5563 | 0.9218 | 5.4496 | -266.3995 | -402.1171 | -4.6752 | -4.7438 |
0.1503 | 1.69 | 3700 | 0.1545 | 5.0576 | -0.3291 | 0.9134 | 5.3868 | -265.9452 | -401.7885 | -4.7176 | -4.7882 |
0.1313 | 1.73 | 3800 | 0.1493 | 5.0374 | -0.3896 | 0.9134 | 5.4271 | -266.0661 | -401.8289 | -4.7076 | -4.7735 |
0.1312 | 1.78 | 3900 | 0.1480 | 5.0451 | -0.3528 | 0.9162 | 5.3979 | -265.9925 | -401.8134 | -4.7360 | -4.8095 |
0.1227 | 1.83 | 4000 | 0.1472 | 4.8811 | -0.6323 | 0.9162 | 5.5134 | -266.5515 | -402.1416 | -4.7648 | -4.8335 |
0.1364 | 1.87 | 4100 | 0.1464 | 4.8835 | -0.6254 | 0.9190 | 5.5089 | -266.5378 | -402.1368 | -4.7688 | -4.8339 |
0.1472 | 1.92 | 4200 | 0.1461 | 5.0051 | -0.5056 | 0.9190 | 5.5107 | -266.2981 | -401.8935 | -4.7772 | -4.8406 |
0.1187 | 1.96 | 4300 | 0.1460 | 5.0734 | -0.4883 | 0.9246 | 5.5618 | -266.2636 | -401.7569 | -4.8278 | -4.8964 |
0.1212 | 2.01 | 4400 | 0.1476 | 4.6410 | -0.9740 | 0.9218 | 5.6150 | -267.2350 | -402.6217 | -4.7401 | -4.8056 |
0.0998 | 2.05 | 4500 | 0.1453 | 4.8904 | -0.7622 | 0.9190 | 5.6526 | -266.8114 | -402.1230 | -4.7775 | -4.8457 |
0.1119 | 2.1 | 4600 | 0.1471 | 4.8572 | -0.8493 | 0.9246 | 5.7066 | -266.9856 | -402.1892 | -4.7661 | -4.8375 |
0.1175 | 2.15 | 4700 | 0.1480 | 4.8949 | -0.8295 | 0.9134 | 5.7244 | -266.9460 | -402.1140 | -4.8256 | -4.8951 |
0.1046 | 2.19 | 4800 | 0.1457 | 4.9889 | -0.7191 | 0.9162 | 5.7081 | -266.7252 | -401.9258 | -4.7844 | -4.8509 |
0.1267 | 2.24 | 4900 | 0.1491 | 4.3841 | -1.2180 | 0.9190 | 5.6021 | -267.7230 | -403.1356 | -4.8244 | -4.8912 |
0.1188 | 2.28 | 5000 | 0.1445 | 4.8538 | -0.7816 | 0.9190 | 5.6354 | -266.8502 | -402.1961 | -4.8018 | -4.8691 |
0.1105 | 2.33 | 5100 | 0.1450 | 4.6539 | -0.9854 | 0.9218 | 5.6393 | -267.2578 | -402.5959 | -4.8686 | -4.9350 |
0.1213 | 2.37 | 5200 | 0.1475 | 4.5392 | -1.0654 | 0.9162 | 5.6046 | -267.4177 | -402.8253 | -4.8665 | -4.9331 |
0.1193 | 2.42 | 5300 | 0.1475 | 4.8873 | -0.8375 | 0.9246 | 5.7248 | -266.9619 | -402.1292 | -4.8357 | -4.9002 |
0.1084 | 2.46 | 5400 | 0.1477 | 4.7995 | -0.9213 | 0.9162 | 5.7209 | -267.1296 | -402.3047 | -4.8708 | -4.9381 |
0.103 | 2.51 | 5500 | 0.1450 | 4.9781 | -0.8373 | 0.9246 | 5.8153 | -266.9615 | -401.9476 | -4.8037 | -4.8687 |
0.1032 | 2.56 | 5600 | 0.1449 | 4.9292 | -0.9533 | 0.9218 | 5.8825 | -267.1936 | -402.0454 | -4.7928 | -4.8573 |
0.1076 | 2.6 | 5700 | 0.1446 | 5.1772 | -0.7431 | 0.9134 | 5.9203 | -266.7732 | -401.5494 | -4.7715 | -4.8365 |
0.1048 | 2.65 | 5800 | 0.1438 | 4.8244 | -1.0354 | 0.9218 | 5.8598 | -267.3577 | -402.2549 | -4.8149 | -4.8820 |
0.0975 | 2.69 | 5900 | 0.1446 | 4.6830 | -1.1482 | 0.9134 | 5.8312 | -267.5833 | -402.5378 | -4.8247 | -4.8923 |
0.1251 | 2.74 | 6000 | 0.1433 | 4.8302 | -1.0405 | 0.9134 | 5.8707 | -267.3680 | -402.2434 | -4.8053 | -4.8717 |
0.1279 | 2.78 | 6100 | 0.1433 | 4.9076 | -0.9568 | 0.9134 | 5.8645 | -267.2006 | -402.0885 | -4.8238 | -4.8905 |
0.1334 | 2.83 | 6200 | 0.1434 | 4.9038 | -0.9683 | 0.9106 | 5.8720 | -267.2234 | -402.0963 | -4.8401 | -4.9070 |
0.111 | 2.88 | 6300 | 0.1432 | 4.8810 | -0.9900 | 0.9162 | 5.8709 | -267.2668 | -402.1418 | -4.8330 | -4.9001 |
0.1204 | 2.92 | 6400 | 0.1433 | 4.9268 | -0.9522 | 0.9134 | 5.8790 | -267.1913 | -402.0501 | -4.8292 | -4.8967 |
0.12 | 2.97 | 6500 | 0.1431 | 4.8308 | -1.0363 | 0.9162 | 5.8671 | -267.3595 | -402.2421 | -4.8341 | -4.9021 |
Framework versions
- Transformers 4.35.0
- Pytorch 2.1.1+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1
- Downloads last month
- 28