lole25 commited on
Commit
0174c0e
1 Parent(s): ea7652f

Model save

Browse files
README.md ADDED
@@ -0,0 +1,224 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ base_model: mistralai/Mistral-7B-v0.1
9
+ model-index:
10
+ - name: zephyr-7b-gpo-update3-i0
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # zephyr-7b-gpo-update3-i0
18
+
19
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.0224
22
+ - Rewards/chosen: -0.1801
23
+ - Rewards/rejected: -0.2679
24
+ - Rewards/accuracies: 0.6755
25
+ - Rewards/margins: 0.0877
26
+ - Logps/rejected: -479.4648
27
+ - Logps/chosen: -412.1429
28
+ - Logits/rejected: -0.8868
29
+ - Logits/chosen: -1.0169
30
+
31
+ ## Model description
32
+
33
+ More information needed
34
+
35
+ ## Intended uses & limitations
36
+
37
+ More information needed
38
+
39
+ ## Training and evaluation data
40
+
41
+ More information needed
42
+
43
+ ## Training procedure
44
+
45
+ ### Training hyperparameters
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 5e-06
49
+ - train_batch_size: 2
50
+ - eval_batch_size: 2
51
+ - seed: 42
52
+ - distributed_type: multi-GPU
53
+ - gradient_accumulation_steps: 2
54
+ - total_train_batch_size: 4
55
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
+ - lr_scheduler_type: cosine
57
+ - lr_scheduler_warmup_ratio: 0.1
58
+ - num_epochs: 1
59
+
60
+ ### Training results
61
+
62
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
63
+ |:-------------:|:-----:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
64
+ | 0.0705 | 0.01 | 100 | 0.0537 | 0.0002 | 0.0001 | 0.5070 | 0.0001 | -211.4654 | -231.7675 | -2.1589 | -2.3472 |
65
+ | 0.0633 | 0.01 | 200 | 0.0534 | 0.0006 | 0.0000 | 0.5790 | 0.0006 | -211.5886 | -231.3686 | -2.1606 | -2.3491 |
66
+ | 0.0555 | 0.02 | 300 | 0.0528 | 0.0038 | 0.0019 | 0.5865 | 0.0020 | -209.7327 | -228.1604 | -2.1644 | -2.3530 |
67
+ | 0.0716 | 0.03 | 400 | 0.0517 | 0.0045 | 0.0003 | 0.5985 | 0.0042 | -211.3239 | -227.4712 | -2.1658 | -2.3544 |
68
+ | 0.0532 | 0.03 | 500 | 0.0506 | -0.0335 | -0.0410 | 0.6050 | 0.0076 | -252.6588 | -265.4973 | -2.1479 | -2.3356 |
69
+ | 0.0353 | 0.04 | 600 | 0.0482 | -0.0051 | -0.0176 | 0.6040 | 0.0126 | -229.2327 | -237.0613 | -2.1837 | -2.3733 |
70
+ | 0.0607 | 0.05 | 700 | 0.0442 | -0.0174 | -0.0394 | 0.6155 | 0.0220 | -251.0582 | -249.4283 | -2.1498 | -2.3382 |
71
+ | 0.0373 | 0.05 | 800 | 0.0450 | -0.0986 | -0.1354 | 0.5900 | 0.0368 | -346.9684 | -330.5711 | -2.1610 | -2.3525 |
72
+ | 0.0333 | 0.06 | 900 | 0.0453 | -0.0231 | -0.0419 | 0.6065 | 0.0187 | -253.4719 | -255.1281 | -2.1130 | -2.2974 |
73
+ | 0.0469 | 0.07 | 1000 | 0.0408 | -0.0664 | -0.0994 | 0.6020 | 0.0330 | -311.0526 | -298.4168 | -2.0108 | -2.1907 |
74
+ | 0.0387 | 0.07 | 1100 | 0.0416 | -0.1900 | -0.2240 | 0.6030 | 0.0340 | -435.6592 | -422.0504 | -1.4115 | -1.5584 |
75
+ | 0.0377 | 0.08 | 1200 | 0.0409 | -0.1076 | -0.1513 | 0.6110 | 0.0437 | -362.9415 | -339.6366 | -1.3325 | -1.4831 |
76
+ | 0.0414 | 0.09 | 1300 | 0.0353 | -0.1923 | -0.2414 | 0.6160 | 0.0491 | -453.0328 | -424.3461 | -1.2024 | -1.3430 |
77
+ | 0.0363 | 0.09 | 1400 | 0.0352 | -0.1443 | -0.1836 | 0.625 | 0.0393 | -395.2076 | -376.2808 | -1.3508 | -1.4962 |
78
+ | 0.0741 | 0.1 | 1500 | 0.0350 | -0.1363 | -0.1823 | 0.6235 | 0.0460 | -393.9025 | -368.3273 | -1.0220 | -1.1484 |
79
+ | 0.0348 | 0.1 | 1600 | 0.0334 | -0.2731 | -0.3511 | 0.6275 | 0.0780 | -562.7497 | -505.1403 | -0.8525 | -0.9803 |
80
+ | 0.0251 | 0.11 | 1700 | 0.0318 | -0.2572 | -0.3298 | 0.6410 | 0.0726 | -541.3788 | -489.1554 | -1.1495 | -1.2961 |
81
+ | 0.036 | 0.12 | 1800 | 0.0325 | -0.1508 | -0.1958 | 0.6205 | 0.0451 | -407.4576 | -382.7708 | -1.5867 | -1.7516 |
82
+ | 0.0142 | 0.12 | 1900 | 0.0312 | -0.2575 | -0.3145 | 0.6335 | 0.0570 | -526.0776 | -489.4697 | -1.2253 | -1.3692 |
83
+ | 0.0176 | 0.13 | 2000 | 0.0282 | -0.1856 | -0.2730 | 0.6460 | 0.0873 | -484.5845 | -417.6276 | -1.5396 | -1.7095 |
84
+ | 0.0176 | 0.14 | 2100 | 0.0275 | -0.1327 | -0.2078 | 0.6505 | 0.0751 | -419.3942 | -364.7262 | -1.5587 | -1.7265 |
85
+ | 0.0387 | 0.14 | 2200 | 0.0277 | -0.1042 | -0.1708 | 0.6385 | 0.0666 | -382.4240 | -336.1856 | -1.6316 | -1.8005 |
86
+ | 0.0284 | 0.15 | 2300 | 0.0275 | -0.1814 | -0.2465 | 0.6345 | 0.0651 | -458.0886 | -413.4149 | -1.7580 | -1.9373 |
87
+ | 0.0351 | 0.16 | 2400 | 0.0296 | -0.1479 | -0.2087 | 0.6405 | 0.0609 | -420.3434 | -379.8790 | -1.6926 | -1.8704 |
88
+ | 0.0143 | 0.16 | 2500 | 0.0285 | -0.1597 | -0.2193 | 0.6545 | 0.0597 | -430.9314 | -391.6554 | -1.5350 | -1.6983 |
89
+ | 0.0224 | 0.17 | 2600 | 0.0265 | -0.2066 | -0.2771 | 0.6545 | 0.0706 | -488.7431 | -438.5660 | -1.3152 | -1.4686 |
90
+ | 0.0331 | 0.18 | 2700 | 0.0268 | -0.1739 | -0.2488 | 0.6515 | 0.0748 | -460.3621 | -405.9103 | -1.5228 | -1.6880 |
91
+ | 0.0387 | 0.18 | 2800 | 0.0276 | -0.0764 | -0.1367 | 0.6510 | 0.0603 | -348.3400 | -308.4065 | -1.4048 | -1.5555 |
92
+ | 0.0343 | 0.19 | 2900 | 0.0264 | -0.2299 | -0.3102 | 0.6535 | 0.0803 | -521.8264 | -461.8814 | -1.0216 | -1.1548 |
93
+ | 0.0267 | 0.2 | 3000 | 0.0275 | -0.2473 | -0.3356 | 0.6520 | 0.0883 | -547.2559 | -479.3535 | -1.0688 | -1.2088 |
94
+ | 0.0355 | 0.2 | 3100 | 0.0280 | -0.2277 | -0.2978 | 0.6415 | 0.0700 | -509.3696 | -459.7389 | -1.2857 | -1.4360 |
95
+ | 0.0291 | 0.21 | 3200 | 0.0259 | -0.1519 | -0.2282 | 0.6635 | 0.0763 | -439.8501 | -383.9444 | -1.3484 | -1.5017 |
96
+ | 0.035 | 0.22 | 3300 | 0.0257 | -0.1210 | -0.2008 | 0.6555 | 0.0798 | -412.4005 | -353.0179 | -1.5265 | -1.6883 |
97
+ | 0.0319 | 0.22 | 3400 | 0.0263 | -0.1372 | -0.2147 | 0.6515 | 0.0775 | -426.3360 | -369.1944 | -1.4126 | -1.5692 |
98
+ | 0.0257 | 0.23 | 3500 | 0.0256 | -0.1661 | -0.2429 | 0.6550 | 0.0768 | -454.5053 | -398.1262 | -1.4163 | -1.5722 |
99
+ | 0.0275 | 0.24 | 3600 | 0.0262 | -0.1719 | -0.2629 | 0.6575 | 0.0910 | -474.4635 | -403.8749 | -1.3717 | -1.5261 |
100
+ | 0.0367 | 0.24 | 3700 | 0.0266 | -0.1726 | -0.2519 | 0.6575 | 0.0793 | -463.4673 | -404.5643 | -1.4203 | -1.5758 |
101
+ | 0.0357 | 0.25 | 3800 | 0.0260 | -0.0704 | -0.1387 | 0.6575 | 0.0682 | -350.2820 | -302.4371 | -1.5307 | -1.6889 |
102
+ | 0.0249 | 0.26 | 3900 | 0.0256 | -0.2003 | -0.2911 | 0.6635 | 0.0908 | -502.6758 | -432.3128 | -1.0767 | -1.2149 |
103
+ | 0.0496 | 0.26 | 4000 | 0.0246 | -0.1700 | -0.2550 | 0.6640 | 0.0850 | -466.5954 | -402.0156 | -1.2870 | -1.4356 |
104
+ | 0.0166 | 0.27 | 4100 | 0.0273 | -0.1833 | -0.2458 | 0.6600 | 0.0625 | -457.4213 | -415.3354 | -1.2058 | -1.3468 |
105
+ | 0.0257 | 0.27 | 4200 | 0.0275 | -0.1551 | -0.2293 | 0.6505 | 0.0742 | -440.8662 | -387.0569 | -1.3883 | -1.5435 |
106
+ | 0.0381 | 0.28 | 4300 | 0.0256 | -0.1096 | -0.1865 | 0.6630 | 0.0769 | -398.1021 | -341.5804 | -1.5158 | -1.6790 |
107
+ | 0.0142 | 0.29 | 4400 | 0.0256 | -0.1428 | -0.2296 | 0.6605 | 0.0868 | -441.2437 | -374.8350 | -1.1203 | -1.2625 |
108
+ | 0.0161 | 0.29 | 4500 | 0.0253 | -0.1292 | -0.2014 | 0.6585 | 0.0722 | -412.9791 | -361.1862 | -1.2417 | -1.3864 |
109
+ | 0.0252 | 0.3 | 4600 | 0.0260 | -0.0895 | -0.1540 | 0.6615 | 0.0645 | -365.6145 | -321.5078 | -1.4506 | -1.6068 |
110
+ | 0.0265 | 0.31 | 4700 | 0.0262 | -0.2428 | -0.3365 | 0.6565 | 0.0937 | -548.1023 | -474.7587 | -0.9481 | -1.0844 |
111
+ | 0.0428 | 0.31 | 4800 | 0.0251 | -0.1762 | -0.2585 | 0.6590 | 0.0822 | -470.0755 | -408.2503 | -0.7928 | -0.9170 |
112
+ | 0.0331 | 0.32 | 4900 | 0.0257 | -0.1637 | -0.2481 | 0.6585 | 0.0844 | -459.6623 | -395.7015 | -0.8176 | -0.9423 |
113
+ | 0.0206 | 0.33 | 5000 | 0.0263 | -0.1448 | -0.2194 | 0.6635 | 0.0746 | -430.9643 | -376.7931 | -0.7098 | -0.8233 |
114
+ | 0.0158 | 0.33 | 5100 | 0.0256 | -0.2789 | -0.3617 | 0.6555 | 0.0828 | -573.3056 | -510.8705 | -0.7416 | -0.8615 |
115
+ | 0.0145 | 0.34 | 5200 | 0.0260 | -0.1978 | -0.2690 | 0.6660 | 0.0711 | -480.5622 | -429.8432 | -0.9478 | -1.0757 |
116
+ | 0.0209 | 0.35 | 5300 | 0.0255 | -0.1522 | -0.2287 | 0.6585 | 0.0766 | -440.3552 | -384.1584 | -1.2392 | -1.3869 |
117
+ | 0.0292 | 0.35 | 5400 | 0.0258 | -0.1740 | -0.2459 | 0.6605 | 0.0719 | -457.4742 | -405.9723 | -1.2221 | -1.3683 |
118
+ | 0.0104 | 0.36 | 5500 | 0.0258 | -0.1628 | -0.2414 | 0.6585 | 0.0786 | -453.0058 | -394.8098 | -1.1724 | -1.3171 |
119
+ | 0.0201 | 0.37 | 5600 | 0.0267 | -0.3001 | -0.3834 | 0.6595 | 0.0833 | -595.0033 | -532.1312 | -1.1342 | -1.2817 |
120
+ | 0.0258 | 0.37 | 5700 | 0.0264 | -0.3214 | -0.4042 | 0.6495 | 0.0827 | -615.7876 | -553.4460 | -0.9025 | -1.0350 |
121
+ | 0.0254 | 0.38 | 5800 | 0.0248 | -0.1813 | -0.2698 | 0.6560 | 0.0885 | -481.4164 | -413.2734 | -1.2336 | -1.3844 |
122
+ | 0.0237 | 0.39 | 5900 | 0.0247 | -0.1357 | -0.2169 | 0.6605 | 0.0811 | -428.4645 | -367.7495 | -1.2361 | -1.3841 |
123
+ | 0.025 | 0.39 | 6000 | 0.0250 | -0.0936 | -0.1640 | 0.6605 | 0.0704 | -375.6244 | -325.6407 | -1.3252 | -1.4747 |
124
+ | 0.0267 | 0.4 | 6100 | 0.0245 | -0.1079 | -0.1847 | 0.6640 | 0.0768 | -396.3334 | -339.8831 | -1.1771 | -1.3187 |
125
+ | 0.0157 | 0.41 | 6200 | 0.0244 | -0.1200 | -0.1970 | 0.6600 | 0.0769 | -408.5906 | -352.0449 | -1.2099 | -1.3534 |
126
+ | 0.0339 | 0.41 | 6300 | 0.0250 | -0.1141 | -0.1911 | 0.6645 | 0.0770 | -402.7368 | -346.1321 | -1.1887 | -1.3301 |
127
+ | 0.0239 | 0.42 | 6400 | 0.0256 | -0.1095 | -0.1887 | 0.6545 | 0.0792 | -400.2938 | -341.5355 | -1.1653 | -1.3054 |
128
+ | 0.0609 | 0.43 | 6500 | 0.0258 | -0.1790 | -0.2637 | 0.6640 | 0.0847 | -475.3234 | -411.0543 | -0.7519 | -0.8671 |
129
+ | 0.0274 | 0.43 | 6600 | 0.0252 | -0.1233 | -0.2002 | 0.6685 | 0.0769 | -411.8316 | -355.3340 | -1.1117 | -1.2477 |
130
+ | 0.0308 | 0.44 | 6700 | 0.0260 | -0.2033 | -0.2927 | 0.6580 | 0.0894 | -504.2830 | -435.3035 | -0.8339 | -0.9571 |
131
+ | 0.0442 | 0.44 | 6800 | 0.0252 | -0.1567 | -0.2327 | 0.6715 | 0.0760 | -444.3407 | -388.7112 | -0.9082 | -1.0316 |
132
+ | 0.0454 | 0.45 | 6900 | 0.0244 | -0.1860 | -0.2627 | 0.6660 | 0.0767 | -474.3181 | -417.9738 | -0.8091 | -0.9271 |
133
+ | 0.0229 | 0.46 | 7000 | 0.0241 | -0.1897 | -0.2739 | 0.6705 | 0.0843 | -485.5567 | -421.6742 | -0.7967 | -0.9160 |
134
+ | 0.0213 | 0.46 | 7100 | 0.0239 | -0.2099 | -0.2963 | 0.6675 | 0.0864 | -507.9073 | -441.9356 | -0.6326 | -0.7425 |
135
+ | 0.0351 | 0.47 | 7200 | 0.0241 | -0.1826 | -0.2598 | 0.6685 | 0.0772 | -471.4492 | -414.6008 | -0.7077 | -0.8202 |
136
+ | 0.0198 | 0.48 | 7300 | 0.0237 | -0.2418 | -0.3216 | 0.6695 | 0.0799 | -533.2533 | -473.7774 | -0.6382 | -0.7481 |
137
+ | 0.0267 | 0.48 | 7400 | 0.0238 | -0.2263 | -0.3121 | 0.6635 | 0.0857 | -523.6796 | -458.3290 | -0.8072 | -0.9286 |
138
+ | 0.0183 | 0.49 | 7500 | 0.0240 | -0.2262 | -0.3151 | 0.6685 | 0.0889 | -526.6686 | -458.1802 | -0.7953 | -0.9168 |
139
+ | 0.0384 | 0.5 | 7600 | 0.0244 | -0.2211 | -0.3110 | 0.6620 | 0.0900 | -522.6359 | -453.0678 | -0.8678 | -0.9928 |
140
+ | 0.0107 | 0.5 | 7700 | 0.0243 | -0.1361 | -0.2179 | 0.6615 | 0.0818 | -429.5078 | -368.1310 | -1.1731 | -1.3135 |
141
+ | 0.026 | 0.51 | 7800 | 0.0248 | -0.2264 | -0.3139 | 0.6630 | 0.0875 | -525.5045 | -458.3771 | -0.8686 | -0.9939 |
142
+ | 0.0268 | 0.52 | 7900 | 0.0235 | -0.2119 | -0.3016 | 0.6720 | 0.0897 | -513.2527 | -443.9242 | -1.0222 | -1.1573 |
143
+ | 0.0368 | 0.52 | 8000 | 0.0234 | -0.1716 | -0.2553 | 0.6675 | 0.0837 | -466.9293 | -403.5861 | -1.0878 | -1.2254 |
144
+ | 0.0293 | 0.53 | 8100 | 0.0230 | -0.2229 | -0.3118 | 0.6695 | 0.0889 | -523.4254 | -454.8972 | -0.8559 | -0.9809 |
145
+ | 0.0127 | 0.54 | 8200 | 0.0234 | -0.1810 | -0.2616 | 0.6660 | 0.0807 | -473.2369 | -412.9599 | -1.0361 | -1.1700 |
146
+ | 0.0169 | 0.54 | 8300 | 0.0241 | -0.1442 | -0.2229 | 0.6690 | 0.0787 | -434.5301 | -376.2298 | -1.1765 | -1.3181 |
147
+ | 0.0177 | 0.55 | 8400 | 0.0249 | -0.1232 | -0.1920 | 0.6685 | 0.0687 | -403.5682 | -355.2328 | -1.1804 | -1.3186 |
148
+ | 0.0277 | 0.56 | 8500 | 0.0232 | -0.2036 | -0.2918 | 0.6715 | 0.0882 | -503.4426 | -435.6166 | -0.9559 | -1.0856 |
149
+ | 0.0187 | 0.56 | 8600 | 0.0230 | -0.1969 | -0.2868 | 0.6700 | 0.0898 | -498.3626 | -428.9141 | -0.9720 | -1.1033 |
150
+ | 0.0464 | 0.57 | 8700 | 0.0232 | -0.2151 | -0.2976 | 0.6720 | 0.0826 | -509.2527 | -447.0790 | -0.8658 | -0.9893 |
151
+ | 0.0296 | 0.58 | 8800 | 0.0231 | -0.1914 | -0.2749 | 0.6730 | 0.0835 | -486.5063 | -423.3791 | -0.9562 | -1.0852 |
152
+ | 0.0416 | 0.58 | 8900 | 0.0230 | -0.2546 | -0.3499 | 0.6720 | 0.0953 | -561.4706 | -486.5627 | -0.8593 | -0.9866 |
153
+ | 0.0374 | 0.59 | 9000 | 0.0229 | -0.1957 | -0.2784 | 0.6695 | 0.0827 | -490.0193 | -427.6933 | -0.9676 | -1.0981 |
154
+ | 0.026 | 0.6 | 9100 | 0.0231 | -0.1901 | -0.2688 | 0.6720 | 0.0787 | -480.4329 | -422.1302 | -1.0459 | -1.1806 |
155
+ | 0.0247 | 0.6 | 9200 | 0.0236 | -0.1171 | -0.1918 | 0.6705 | 0.0747 | -403.3864 | -349.0942 | -1.1933 | -1.3378 |
156
+ | 0.0193 | 0.61 | 9300 | 0.0231 | -0.2085 | -0.2946 | 0.6705 | 0.0862 | -506.2588 | -440.4871 | -0.9579 | -1.0926 |
157
+ | 0.028 | 0.62 | 9400 | 0.0232 | -0.1847 | -0.2630 | 0.6750 | 0.0783 | -474.6612 | -416.7447 | -0.9186 | -1.0483 |
158
+ | 0.0119 | 0.62 | 9500 | 0.0235 | -0.2603 | -0.3495 | 0.6660 | 0.0892 | -561.1232 | -492.2703 | -0.6150 | -0.7300 |
159
+ | 0.0178 | 0.63 | 9600 | 0.0232 | -0.2461 | -0.3329 | 0.6655 | 0.0868 | -544.4890 | -478.0711 | -0.6486 | -0.7644 |
160
+ | 0.0355 | 0.63 | 9700 | 0.0232 | -0.2619 | -0.3441 | 0.6650 | 0.0822 | -555.6818 | -493.8837 | -0.7045 | -0.8232 |
161
+ | 0.0238 | 0.64 | 9800 | 0.0234 | -0.2640 | -0.3436 | 0.6690 | 0.0797 | -555.2313 | -495.9717 | -0.7577 | -0.8786 |
162
+ | 0.0315 | 0.65 | 9900 | 0.0231 | -0.2402 | -0.3324 | 0.6670 | 0.0922 | -543.9803 | -472.1986 | -0.8464 | -0.9754 |
163
+ | 0.0267 | 0.65 | 10000 | 0.0233 | -0.2333 | -0.3282 | 0.6645 | 0.0949 | -539.8396 | -465.3473 | -0.8768 | -1.0084 |
164
+ | 0.018 | 0.66 | 10100 | 0.0235 | -0.1871 | -0.2697 | 0.6665 | 0.0826 | -481.2975 | -419.0774 | -0.9507 | -1.0827 |
165
+ | 0.0183 | 0.67 | 10200 | 0.0233 | -0.2143 | -0.3107 | 0.6660 | 0.0964 | -522.2762 | -446.3001 | -1.0028 | -1.1422 |
166
+ | 0.0162 | 0.67 | 10300 | 0.0229 | -0.1964 | -0.2831 | 0.6675 | 0.0867 | -494.7217 | -428.4237 | -0.9919 | -1.1283 |
167
+ | 0.0134 | 0.68 | 10400 | 0.0231 | -0.2075 | -0.2984 | 0.6660 | 0.0909 | -510.0122 | -439.4990 | -0.9949 | -1.1326 |
168
+ | 0.0195 | 0.69 | 10500 | 0.0230 | -0.2028 | -0.2909 | 0.6665 | 0.0881 | -502.5017 | -434.7631 | -0.9652 | -1.1005 |
169
+ | 0.0151 | 0.69 | 10600 | 0.0232 | -0.2275 | -0.3201 | 0.6685 | 0.0927 | -531.7596 | -459.4988 | -0.8827 | -1.0146 |
170
+ | 0.0207 | 0.7 | 10700 | 0.0229 | -0.2101 | -0.2965 | 0.6745 | 0.0863 | -508.0856 | -442.1295 | -0.8176 | -0.9439 |
171
+ | 0.0343 | 0.71 | 10800 | 0.0229 | -0.1772 | -0.2624 | 0.6725 | 0.0852 | -474.0302 | -409.1922 | -0.9335 | -1.0660 |
172
+ | 0.0277 | 0.71 | 10900 | 0.0232 | -0.1832 | -0.2641 | 0.6715 | 0.0809 | -475.7294 | -415.1988 | -0.8820 | -1.0102 |
173
+ | 0.0468 | 0.72 | 11000 | 0.0232 | -0.1684 | -0.2502 | 0.6710 | 0.0818 | -461.7660 | -400.4062 | -0.9471 | -1.0790 |
174
+ | 0.0205 | 0.73 | 11100 | 0.0231 | -0.1485 | -0.2324 | 0.6715 | 0.0838 | -443.9662 | -380.5112 | -1.0276 | -1.1645 |
175
+ | 0.0208 | 0.73 | 11200 | 0.0232 | -0.1421 | -0.2241 | 0.6665 | 0.0820 | -435.7383 | -374.1264 | -1.0866 | -1.2266 |
176
+ | 0.0203 | 0.74 | 11300 | 0.0228 | -0.1865 | -0.2734 | 0.6695 | 0.0869 | -485.0168 | -418.5069 | -0.9368 | -1.0693 |
177
+ | 0.0322 | 0.75 | 11400 | 0.0232 | -0.1914 | -0.2833 | 0.6705 | 0.0919 | -494.9005 | -423.4102 | -1.0057 | -1.1440 |
178
+ | 0.0208 | 0.75 | 11500 | 0.0230 | -0.1844 | -0.2674 | 0.6710 | 0.0830 | -479.0218 | -416.4353 | -0.8679 | -0.9952 |
179
+ | 0.0289 | 0.76 | 11600 | 0.0229 | -0.2138 | -0.3059 | 0.6670 | 0.0921 | -517.5433 | -445.8511 | -0.7842 | -0.9087 |
180
+ | 0.0196 | 0.77 | 11700 | 0.0229 | -0.2163 | -0.3027 | 0.6690 | 0.0864 | -514.3256 | -448.2766 | -0.6985 | -0.8165 |
181
+ | 0.0164 | 0.77 | 11800 | 0.0228 | -0.2281 | -0.3127 | 0.6700 | 0.0846 | -524.3269 | -460.1056 | -0.6315 | -0.7455 |
182
+ | 0.0204 | 0.78 | 11900 | 0.0228 | -0.2507 | -0.3406 | 0.6695 | 0.0899 | -552.1954 | -482.6768 | -0.6060 | -0.7203 |
183
+ | 0.0332 | 0.79 | 12000 | 0.0228 | -0.2229 | -0.3094 | 0.6685 | 0.0865 | -521.0510 | -454.8977 | -0.6991 | -0.8177 |
184
+ | 0.0127 | 0.79 | 12100 | 0.0227 | -0.2028 | -0.2871 | 0.6675 | 0.0843 | -498.7013 | -434.7568 | -0.7612 | -0.8830 |
185
+ | 0.0325 | 0.8 | 12200 | 0.0228 | -0.1688 | -0.2506 | 0.6710 | 0.0819 | -462.2358 | -400.7754 | -0.8779 | -1.0058 |
186
+ | 0.0312 | 0.8 | 12300 | 0.0226 | -0.1790 | -0.2638 | 0.6690 | 0.0849 | -475.4503 | -410.9585 | -0.8499 | -0.9771 |
187
+ | 0.0288 | 0.81 | 12400 | 0.0226 | -0.1852 | -0.2705 | 0.6705 | 0.0853 | -482.1120 | -417.2077 | -0.8575 | -0.9853 |
188
+ | 0.0124 | 0.82 | 12500 | 0.0227 | -0.1829 | -0.2670 | 0.6700 | 0.0841 | -478.6212 | -414.9066 | -0.8720 | -1.0003 |
189
+ | 0.0164 | 0.82 | 12600 | 0.0226 | -0.1860 | -0.2705 | 0.6740 | 0.0845 | -482.1584 | -418.0470 | -0.8740 | -1.0031 |
190
+ | 0.0123 | 0.83 | 12700 | 0.0226 | -0.1777 | -0.2626 | 0.6725 | 0.0850 | -474.2336 | -409.6584 | -0.8919 | -1.0220 |
191
+ | 0.0172 | 0.84 | 12800 | 0.0226 | -0.1748 | -0.2600 | 0.6720 | 0.0852 | -471.6224 | -406.8354 | -0.8885 | -1.0182 |
192
+ | 0.0077 | 0.84 | 12900 | 0.0225 | -0.1771 | -0.2640 | 0.6735 | 0.0869 | -475.6176 | -409.0995 | -0.9263 | -1.0589 |
193
+ | 0.0102 | 0.85 | 13000 | 0.0225 | -0.1702 | -0.2566 | 0.6725 | 0.0864 | -468.2498 | -402.1976 | -0.9231 | -1.0553 |
194
+ | 0.0352 | 0.86 | 13100 | 0.0226 | -0.1723 | -0.2576 | 0.6735 | 0.0853 | -469.2229 | -404.3332 | -0.9195 | -1.0515 |
195
+ | 0.017 | 0.86 | 13200 | 0.0225 | -0.1818 | -0.2697 | 0.6740 | 0.0879 | -481.2682 | -413.8024 | -0.8943 | -1.0253 |
196
+ | 0.0207 | 0.87 | 13300 | 0.0225 | -0.1720 | -0.2583 | 0.6725 | 0.0863 | -469.9547 | -404.0227 | -0.9057 | -1.0369 |
197
+ | 0.0315 | 0.88 | 13400 | 0.0225 | -0.1693 | -0.2546 | 0.6735 | 0.0853 | -466.2376 | -401.3037 | -0.9093 | -1.0403 |
198
+ | 0.0148 | 0.88 | 13500 | 0.0225 | -0.1702 | -0.2556 | 0.6715 | 0.0855 | -467.2293 | -402.1566 | -0.9070 | -1.0379 |
199
+ | 0.0191 | 0.89 | 13600 | 0.0225 | -0.1710 | -0.2578 | 0.6745 | 0.0868 | -469.4186 | -402.9745 | -0.9059 | -1.0370 |
200
+ | 0.0221 | 0.9 | 13700 | 0.0224 | -0.1684 | -0.2544 | 0.6745 | 0.0861 | -466.0537 | -400.3587 | -0.9192 | -1.0510 |
201
+ | 0.0299 | 0.9 | 13800 | 0.0224 | -0.1708 | -0.2578 | 0.6730 | 0.0871 | -469.4453 | -402.7551 | -0.9125 | -1.0439 |
202
+ | 0.0219 | 0.91 | 13900 | 0.0224 | -0.1743 | -0.2623 | 0.6730 | 0.0880 | -473.8788 | -406.2876 | -0.9065 | -1.0379 |
203
+ | 0.024 | 0.92 | 14000 | 0.0224 | -0.1787 | -0.2671 | 0.6755 | 0.0885 | -478.7598 | -410.6616 | -0.8850 | -1.0154 |
204
+ | 0.0228 | 0.92 | 14100 | 0.0225 | -0.1771 | -0.2650 | 0.6740 | 0.0879 | -476.6039 | -409.0930 | -0.8919 | -1.0223 |
205
+ | 0.0146 | 0.93 | 14200 | 0.0224 | -0.1803 | -0.2687 | 0.6770 | 0.0884 | -480.3093 | -412.2579 | -0.8844 | -1.0147 |
206
+ | 0.0164 | 0.94 | 14300 | 0.0225 | -0.1792 | -0.2672 | 0.6755 | 0.0880 | -478.8005 | -411.2285 | -0.8855 | -1.0157 |
207
+ | 0.0248 | 0.94 | 14400 | 0.0224 | -0.1808 | -0.2691 | 0.6745 | 0.0883 | -480.7047 | -412.7735 | -0.8846 | -1.0148 |
208
+ | 0.0118 | 0.95 | 14500 | 0.0224 | -0.1814 | -0.2697 | 0.6725 | 0.0884 | -481.3487 | -413.3884 | -0.8831 | -1.0131 |
209
+ | 0.0346 | 0.96 | 14600 | 0.0224 | -0.1805 | -0.2683 | 0.6750 | 0.0879 | -479.9362 | -412.4734 | -0.8849 | -1.0152 |
210
+ | 0.0182 | 0.96 | 14700 | 0.0224 | -0.1800 | -0.2678 | 0.6740 | 0.0877 | -479.3696 | -412.0334 | -0.8840 | -1.0140 |
211
+ | 0.0084 | 0.97 | 14800 | 0.0224 | -0.1805 | -0.2684 | 0.6745 | 0.0878 | -480.0011 | -412.5492 | -0.8846 | -1.0147 |
212
+ | 0.0249 | 0.97 | 14900 | 0.0224 | -0.1807 | -0.2685 | 0.6765 | 0.0879 | -480.1522 | -412.6696 | -0.8850 | -1.0151 |
213
+ | 0.0184 | 0.98 | 15000 | 0.0224 | -0.1804 | -0.2682 | 0.6755 | 0.0878 | -479.8432 | -412.4375 | -0.8854 | -1.0154 |
214
+ | 0.0345 | 0.99 | 15100 | 0.0224 | -0.1801 | -0.2679 | 0.6735 | 0.0877 | -479.4683 | -412.1548 | -0.8840 | -1.0139 |
215
+ | 0.0244 | 0.99 | 15200 | 0.0224 | -0.1803 | -0.2680 | 0.6750 | 0.0877 | -479.6048 | -412.2724 | -0.8862 | -1.0163 |
216
+
217
+
218
+ ### Framework versions
219
+
220
+ - PEFT 0.7.1
221
+ - Transformers 4.36.2
222
+ - Pytorch 2.1.2+cu121
223
+ - Datasets 2.14.6
224
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7dd2e7fd3655f3d8002a6f57f83409f1be36ba5d39971e595e96d43dd7eeff73
3
  size 671150064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f500cb5d542e5dc96f9db8a8cad9fef5f93a2dea90b823170619f1a455a1a94e
3
  size 671150064
all_results.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_logits/chosen": -1.0169163942337036,
4
+ "eval_logits/rejected": -0.8867645263671875,
5
+ "eval_logps/chosen": -412.1428527832031,
6
+ "eval_logps/rejected": -479.4647521972656,
7
+ "eval_loss": 0.02242046222090721,
8
+ "eval_rewards/accuracies": 0.6754999756813049,
9
+ "eval_rewards/chosen": -0.18013788759708405,
10
+ "eval_rewards/margins": 0.08771497756242752,
11
+ "eval_rewards/rejected": -0.26785287261009216,
12
+ "eval_runtime": 713.9076,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 2.801,
15
+ "eval_steps_per_second": 1.401,
16
+ "train_loss": 0.028396672180453324,
17
+ "train_runtime": 172620.3313,
18
+ "train_samples": 61135,
19
+ "train_samples_per_second": 0.354,
20
+ "train_steps_per_second": 0.089
21
+ }
eval_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_logits/chosen": -1.0169163942337036,
4
+ "eval_logits/rejected": -0.8867645263671875,
5
+ "eval_logps/chosen": -412.1428527832031,
6
+ "eval_logps/rejected": -479.4647521972656,
7
+ "eval_loss": 0.02242046222090721,
8
+ "eval_rewards/accuracies": 0.6754999756813049,
9
+ "eval_rewards/chosen": -0.18013788759708405,
10
+ "eval_rewards/margins": 0.08771497756242752,
11
+ "eval_rewards/rejected": -0.26785287261009216,
12
+ "eval_runtime": 713.9076,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 2.801,
15
+ "eval_steps_per_second": 1.401
16
+ }
runs/Apr05_02-26-38_gpu4-119-5/events.out.tfevents.1712244458.gpu4-119-5.3718616.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cd7657ca8c9b5a1595383d30f872c7ce20a59b68d8fa4a543dac6ba7545470e7
3
- size 1081225
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93d1ef6a95a037ff00ff6944d95b2242f4c07fe57aff8b04f035e64577044c4d
3
+ size 1086651
runs/Apr05_02-26-38_gpu4-119-5/events.out.tfevents.1712417793.gpu4-119-5.3718616.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b5759246b5561b9def8767164f3296fc902d31a2785cf05e4544c4acf60fdee
3
+ size 828
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.028396672180453324,
4
+ "train_runtime": 172620.3313,
5
+ "train_samples": 61135,
6
+ "train_samples_per_second": 0.354,
7
+ "train_steps_per_second": 0.089
8
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff