lole25 commited on
Commit
c899517
1 Parent(s): c2127a1

Model save

Browse files
README.md ADDED
@@ -0,0 +1,224 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ base_model: mistralai/Mistral-7B-v0.1
9
+ model-index:
10
+ - name: zephyr-7b-gpo-update4-i0
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # zephyr-7b-gpo-update4-i0
18
+
19
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.0239
22
+ - Rewards/chosen: -0.0415
23
+ - Rewards/rejected: -0.1266
24
+ - Rewards/accuracies: 0.6600
25
+ - Rewards/margins: 0.0851
26
+ - Logps/rejected: -236.9313
27
+ - Logps/chosen: -240.2975
28
+ - Logits/rejected: -2.0900
29
+ - Logits/chosen: -2.2780
30
+
31
+ ## Model description
32
+
33
+ More information needed
34
+
35
+ ## Intended uses & limitations
36
+
37
+ More information needed
38
+
39
+ ## Training and evaluation data
40
+
41
+ More information needed
42
+
43
+ ## Training procedure
44
+
45
+ ### Training hyperparameters
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 5e-06
49
+ - train_batch_size: 2
50
+ - eval_batch_size: 2
51
+ - seed: 42
52
+ - distributed_type: multi-GPU
53
+ - gradient_accumulation_steps: 2
54
+ - total_train_batch_size: 4
55
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
+ - lr_scheduler_type: cosine
57
+ - lr_scheduler_warmup_ratio: 0.1
58
+ - num_epochs: 1
59
+
60
+ ### Training results
61
+
62
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
63
+ |:-------------:|:-----:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
64
+ | 0.0706 | 0.01 | 100 | 0.0536 | 0.0012 | 0.0008 | 0.4925 | 0.0004 | -211.4570 | -231.7682 | -2.1603 | -2.3487 |
65
+ | 0.0614 | 0.01 | 200 | 0.0524 | 0.0029 | -0.0002 | 0.5890 | 0.0031 | -211.6427 | -231.4156 | -2.1610 | -2.3494 |
66
+ | 0.0495 | 0.02 | 300 | 0.0499 | 0.0190 | 0.0096 | 0.5805 | 0.0093 | -209.6874 | -228.2110 | -2.1645 | -2.3531 |
67
+ | 0.065 | 0.03 | 400 | 0.0470 | 0.0242 | 0.0074 | 0.5980 | 0.0167 | -210.1239 | -227.1680 | -2.1655 | -2.3542 |
68
+ | 0.04 | 0.03 | 500 | 0.0416 | -0.0511 | -0.0901 | 0.6125 | 0.0390 | -229.6261 | -242.2272 | -2.1418 | -2.3291 |
69
+ | 0.0313 | 0.04 | 600 | 0.0413 | -0.0451 | -0.0781 | 0.6160 | 0.0330 | -227.2309 | -241.0243 | -2.1400 | -2.3279 |
70
+ | 0.0519 | 0.05 | 700 | 0.0408 | 0.0360 | 0.0023 | 0.6155 | 0.0336 | -211.1453 | -224.8123 | -2.1490 | -2.3374 |
71
+ | 0.034 | 0.05 | 800 | 0.0369 | -0.0217 | -0.0726 | 0.6125 | 0.0510 | -226.1373 | -236.3390 | -2.1339 | -2.3227 |
72
+ | 0.0343 | 0.06 | 900 | 0.0361 | 0.0040 | -0.0439 | 0.6005 | 0.0478 | -220.3831 | -231.2078 | -2.1160 | -2.3030 |
73
+ | 0.0482 | 0.07 | 1000 | 0.0360 | -0.0952 | -0.1491 | 0.6100 | 0.0539 | -241.4311 | -251.0361 | -2.1121 | -2.2986 |
74
+ | 0.0316 | 0.07 | 1100 | 0.0415 | -0.0564 | -0.1248 | 0.6045 | 0.0684 | -236.5681 | -243.2760 | -2.1106 | -2.2966 |
75
+ | 0.0326 | 0.08 | 1200 | 0.0331 | -0.0594 | -0.1297 | 0.6405 | 0.0702 | -237.5470 | -243.8901 | -2.1027 | -2.2877 |
76
+ | 0.0313 | 0.09 | 1300 | 0.0313 | -0.0027 | -0.0614 | 0.6320 | 0.0587 | -223.8952 | -232.5400 | -2.1139 | -2.2994 |
77
+ | 0.0345 | 0.09 | 1400 | 0.0331 | -0.0106 | -0.0663 | 0.6275 | 0.0557 | -224.8707 | -234.1205 | -2.0508 | -2.2308 |
78
+ | 0.0629 | 0.1 | 1500 | 0.0340 | -0.0537 | -0.1284 | 0.6440 | 0.0747 | -237.2957 | -242.7522 | -2.0009 | -2.1776 |
79
+ | 0.0313 | 0.1 | 1600 | 0.0311 | -0.0639 | -0.1417 | 0.6310 | 0.0777 | -239.9465 | -244.7944 | -2.0578 | -2.2404 |
80
+ | 0.0287 | 0.11 | 1700 | 0.0303 | -0.0281 | -0.0938 | 0.6360 | 0.0657 | -230.3665 | -237.6215 | -2.1022 | -2.2867 |
81
+ | 0.0335 | 0.12 | 1800 | 0.0316 | 0.0007 | -0.0533 | 0.6260 | 0.0540 | -222.2785 | -231.8674 | -2.1158 | -2.3011 |
82
+ | 0.0209 | 0.12 | 1900 | 0.0333 | -0.0611 | -0.1124 | 0.6400 | 0.0513 | -234.0950 | -244.2219 | -2.1046 | -2.2893 |
83
+ | 0.0183 | 0.13 | 2000 | 0.0302 | -0.0622 | -0.1366 | 0.6500 | 0.0744 | -238.9349 | -244.4466 | -2.1213 | -2.3085 |
84
+ | 0.0235 | 0.14 | 2100 | 0.0289 | -0.0383 | -0.1150 | 0.6475 | 0.0767 | -234.6193 | -239.6733 | -2.0933 | -2.2787 |
85
+ | 0.0401 | 0.14 | 2200 | 0.0284 | -0.0577 | -0.1367 | 0.6370 | 0.0790 | -238.9556 | -243.5481 | -2.0898 | -2.2757 |
86
+ | 0.0257 | 0.15 | 2300 | 0.0304 | -0.0226 | -0.1119 | 0.6300 | 0.0893 | -233.9949 | -236.5215 | -2.0975 | -2.2834 |
87
+ | 0.0339 | 0.16 | 2400 | 0.0306 | 0.0075 | -0.0542 | 0.6350 | 0.0617 | -222.4461 | -230.5073 | -2.1318 | -2.3176 |
88
+ | 0.0132 | 0.16 | 2500 | 0.0312 | -0.0022 | -0.0606 | 0.6350 | 0.0584 | -223.7263 | -232.4370 | -2.1390 | -2.3256 |
89
+ | 0.0196 | 0.17 | 2600 | 0.0281 | -0.0069 | -0.0808 | 0.6500 | 0.0739 | -227.7710 | -233.3759 | -2.1025 | -2.2871 |
90
+ | 0.0317 | 0.18 | 2700 | 0.0280 | -0.0329 | -0.1059 | 0.6545 | 0.0730 | -232.7858 | -238.5806 | -2.1090 | -2.2942 |
91
+ | 0.036 | 0.18 | 2800 | 0.0279 | -0.0178 | -0.0869 | 0.6365 | 0.0691 | -228.9888 | -235.5567 | -2.1050 | -2.2897 |
92
+ | 0.0353 | 0.19 | 2900 | 0.0279 | -0.0415 | -0.1092 | 0.6445 | 0.0677 | -233.4535 | -240.3115 | -2.1000 | -2.2848 |
93
+ | 0.0259 | 0.2 | 3000 | 0.0289 | -0.0379 | -0.1213 | 0.6505 | 0.0834 | -235.8732 | -239.5773 | -2.0886 | -2.2741 |
94
+ | 0.0362 | 0.2 | 3100 | 0.0289 | -0.1100 | -0.1916 | 0.6485 | 0.0816 | -249.9393 | -254.0055 | -2.0925 | -2.2772 |
95
+ | 0.0319 | 0.21 | 3200 | 0.0283 | -0.0527 | -0.1321 | 0.6385 | 0.0794 | -238.0300 | -242.5391 | -2.0741 | -2.2569 |
96
+ | 0.0333 | 0.22 | 3300 | 0.0280 | -0.0509 | -0.1397 | 0.6535 | 0.0887 | -239.5463 | -242.1913 | -2.0690 | -2.2521 |
97
+ | 0.0347 | 0.22 | 3400 | 0.0285 | -0.0420 | -0.1146 | 0.6420 | 0.0726 | -234.5293 | -240.4102 | -2.0931 | -2.2767 |
98
+ | 0.025 | 0.23 | 3500 | 0.0277 | -0.0120 | -0.0953 | 0.6560 | 0.0833 | -230.6713 | -234.4121 | -2.0685 | -2.2513 |
99
+ | 0.0305 | 0.24 | 3600 | 0.0276 | -0.0113 | -0.0943 | 0.6520 | 0.0830 | -230.4667 | -234.2561 | -2.0925 | -2.2770 |
100
+ | 0.0331 | 0.24 | 3700 | 0.0283 | -0.0661 | -0.1472 | 0.6435 | 0.0812 | -241.0565 | -245.2157 | -2.0870 | -2.2712 |
101
+ | 0.0351 | 0.25 | 3800 | 0.0291 | -0.0335 | -0.1002 | 0.6410 | 0.0667 | -231.6431 | -238.6974 | -2.1198 | -2.3060 |
102
+ | 0.0164 | 0.26 | 3900 | 0.0280 | -0.0089 | -0.0886 | 0.6340 | 0.0797 | -229.3295 | -233.7921 | -2.0933 | -2.2780 |
103
+ | 0.0445 | 0.26 | 4000 | 0.0271 | -0.0361 | -0.1230 | 0.6390 | 0.0869 | -236.2058 | -239.2251 | -2.0860 | -2.2705 |
104
+ | 0.0176 | 0.27 | 4100 | 0.0289 | -0.0496 | -0.1127 | 0.6470 | 0.0630 | -234.1423 | -241.9269 | -2.1377 | -2.3253 |
105
+ | 0.0244 | 0.27 | 4200 | 0.0293 | -0.0263 | -0.0989 | 0.6425 | 0.0726 | -231.3835 | -237.2554 | -2.1260 | -2.3122 |
106
+ | 0.0378 | 0.28 | 4300 | 0.0267 | 0.0038 | -0.0727 | 0.6440 | 0.0766 | -226.1550 | -231.2366 | -2.0843 | -2.2686 |
107
+ | 0.0135 | 0.29 | 4400 | 0.0273 | -0.0216 | -0.1048 | 0.6435 | 0.0832 | -232.5620 | -236.3245 | -2.0998 | -2.2857 |
108
+ | 0.0143 | 0.29 | 4500 | 0.0268 | -0.0302 | -0.1056 | 0.6375 | 0.0755 | -232.7406 | -238.0378 | -2.0988 | -2.2843 |
109
+ | 0.0268 | 0.3 | 4600 | 0.0270 | -0.0210 | -0.0904 | 0.6400 | 0.0694 | -229.6838 | -236.1970 | -2.1062 | -2.2917 |
110
+ | 0.026 | 0.31 | 4700 | 0.0272 | -0.0808 | -0.1659 | 0.6495 | 0.0851 | -244.7986 | -248.1638 | -2.1119 | -2.2987 |
111
+ | 0.0447 | 0.31 | 4800 | 0.0265 | -0.0634 | -0.1429 | 0.6465 | 0.0795 | -240.1879 | -244.6757 | -2.1131 | -2.2996 |
112
+ | 0.0311 | 0.32 | 4900 | 0.0269 | -0.0418 | -0.1304 | 0.6470 | 0.0886 | -237.6830 | -240.3570 | -2.0728 | -2.2562 |
113
+ | 0.0241 | 0.33 | 5000 | 0.0267 | -0.0626 | -0.1478 | 0.6425 | 0.0853 | -241.1806 | -244.5231 | -2.0852 | -2.2706 |
114
+ | 0.0183 | 0.33 | 5100 | 0.0266 | -0.0589 | -0.1374 | 0.6415 | 0.0785 | -239.0941 | -243.7824 | -2.0980 | -2.2840 |
115
+ | 0.0196 | 0.34 | 5200 | 0.0281 | -0.1203 | -0.1997 | 0.6440 | 0.0794 | -251.5512 | -256.0692 | -2.1151 | -2.3031 |
116
+ | 0.0218 | 0.35 | 5300 | 0.0284 | -0.0855 | -0.1675 | 0.6445 | 0.0821 | -245.1141 | -249.0969 | -2.1199 | -2.3078 |
117
+ | 0.0392 | 0.35 | 5400 | 0.0276 | -0.0211 | -0.0901 | 0.6320 | 0.0690 | -229.6360 | -236.2313 | -2.1202 | -2.3068 |
118
+ | 0.0095 | 0.36 | 5500 | 0.0278 | -0.0108 | -0.0838 | 0.6365 | 0.0729 | -228.3683 | -234.1733 | -2.1144 | -2.3003 |
119
+ | 0.0199 | 0.37 | 5600 | 0.0279 | -0.0468 | -0.1295 | 0.6430 | 0.0827 | -237.5136 | -241.3706 | -2.0764 | -2.2607 |
120
+ | 0.0237 | 0.37 | 5700 | 0.0267 | -0.0323 | -0.1215 | 0.6445 | 0.0891 | -235.9061 | -238.4741 | -2.0452 | -2.2280 |
121
+ | 0.0323 | 0.38 | 5800 | 0.0269 | -0.0412 | -0.1289 | 0.6460 | 0.0876 | -237.3893 | -240.2530 | -2.0370 | -2.2195 |
122
+ | 0.0242 | 0.39 | 5900 | 0.0260 | -0.0303 | -0.1115 | 0.6455 | 0.0812 | -233.9047 | -238.0558 | -2.0427 | -2.2254 |
123
+ | 0.0239 | 0.39 | 6000 | 0.0265 | -0.0064 | -0.0780 | 0.6395 | 0.0716 | -227.2050 | -233.2807 | -2.0840 | -2.2698 |
124
+ | 0.0246 | 0.4 | 6100 | 0.0266 | -0.0466 | -0.1195 | 0.6475 | 0.0728 | -235.5066 | -241.3312 | -2.0964 | -2.2834 |
125
+ | 0.0109 | 0.41 | 6200 | 0.0259 | -0.0380 | -0.1166 | 0.6420 | 0.0786 | -234.9308 | -239.6033 | -2.0589 | -2.2443 |
126
+ | 0.0289 | 0.41 | 6300 | 0.0258 | -0.0286 | -0.1078 | 0.6525 | 0.0791 | -233.1673 | -237.7339 | -2.0557 | -2.2405 |
127
+ | 0.0287 | 0.42 | 6400 | 0.0267 | -0.0259 | -0.1155 | 0.6430 | 0.0896 | -234.7208 | -237.1919 | -2.0664 | -2.2525 |
128
+ | 0.0631 | 0.43 | 6500 | 0.0259 | -0.0313 | -0.1091 | 0.6460 | 0.0778 | -233.4391 | -238.2719 | -2.0895 | -2.2759 |
129
+ | 0.037 | 0.43 | 6600 | 0.0260 | -0.0094 | -0.0871 | 0.6490 | 0.0777 | -229.0337 | -233.8820 | -2.0997 | -2.2868 |
130
+ | 0.0296 | 0.44 | 6700 | 0.0264 | -0.0446 | -0.1288 | 0.6565 | 0.0842 | -237.3631 | -240.9244 | -2.1026 | -2.2903 |
131
+ | 0.038 | 0.44 | 6800 | 0.0262 | -0.0694 | -0.1493 | 0.6565 | 0.0799 | -241.4658 | -245.8865 | -2.0871 | -2.2739 |
132
+ | 0.0458 | 0.45 | 6900 | 0.0261 | -0.0352 | -0.1124 | 0.6525 | 0.0772 | -234.0974 | -239.0529 | -2.0925 | -2.2798 |
133
+ | 0.0275 | 0.46 | 7000 | 0.0257 | -0.0520 | -0.1401 | 0.6535 | 0.0881 | -239.6416 | -242.4081 | -2.0897 | -2.2774 |
134
+ | 0.0175 | 0.46 | 7100 | 0.0255 | -0.0397 | -0.1193 | 0.6530 | 0.0795 | -235.4656 | -239.9513 | -2.1058 | -2.2933 |
135
+ | 0.035 | 0.47 | 7200 | 0.0260 | -0.0543 | -0.1267 | 0.6485 | 0.0724 | -236.9568 | -242.8715 | -2.1193 | -2.3083 |
136
+ | 0.015 | 0.48 | 7300 | 0.0257 | -0.0871 | -0.1622 | 0.6390 | 0.0751 | -244.0609 | -249.4324 | -2.1123 | -2.3009 |
137
+ | 0.0231 | 0.48 | 7400 | 0.0255 | -0.0659 | -0.1463 | 0.6490 | 0.0804 | -240.8683 | -245.1848 | -2.1035 | -2.2913 |
138
+ | 0.0211 | 0.49 | 7500 | 0.0258 | -0.0631 | -0.1462 | 0.6520 | 0.0831 | -240.8420 | -244.6235 | -2.0635 | -2.2485 |
139
+ | 0.0379 | 0.5 | 7600 | 0.0259 | -0.0748 | -0.1597 | 0.6475 | 0.0849 | -243.5423 | -246.9550 | -2.0566 | -2.2404 |
140
+ | 0.0117 | 0.5 | 7700 | 0.0257 | -0.0554 | -0.1408 | 0.6620 | 0.0854 | -239.7720 | -243.0760 | -2.0661 | -2.2502 |
141
+ | 0.0197 | 0.51 | 7800 | 0.0261 | -0.0680 | -0.1537 | 0.6590 | 0.0857 | -242.3484 | -245.6013 | -2.0867 | -2.2723 |
142
+ | 0.0296 | 0.52 | 7900 | 0.0253 | -0.0680 | -0.1488 | 0.6555 | 0.0808 | -241.3649 | -245.6047 | -2.0900 | -2.2762 |
143
+ | 0.0385 | 0.52 | 8000 | 0.0251 | -0.0474 | -0.1297 | 0.6500 | 0.0823 | -237.5529 | -241.4889 | -2.0737 | -2.2589 |
144
+ | 0.0295 | 0.53 | 8100 | 0.0249 | -0.0725 | -0.1568 | 0.6590 | 0.0842 | -242.9643 | -246.5116 | -2.0447 | -2.2293 |
145
+ | 0.0147 | 0.54 | 8200 | 0.0250 | -0.0814 | -0.1636 | 0.6455 | 0.0822 | -244.3407 | -248.2939 | -2.0459 | -2.2301 |
146
+ | 0.0166 | 0.54 | 8300 | 0.0254 | -0.0635 | -0.1415 | 0.6535 | 0.0780 | -239.9138 | -244.7009 | -2.0618 | -2.2466 |
147
+ | 0.0177 | 0.55 | 8400 | 0.0260 | -0.0569 | -0.1258 | 0.6505 | 0.0689 | -236.7758 | -243.3866 | -2.0623 | -2.2464 |
148
+ | 0.0323 | 0.56 | 8500 | 0.0247 | -0.0606 | -0.1478 | 0.6590 | 0.0872 | -241.1788 | -244.1342 | -2.0510 | -2.2352 |
149
+ | 0.0178 | 0.56 | 8600 | 0.0245 | -0.0697 | -0.1572 | 0.6610 | 0.0875 | -243.0600 | -245.9448 | -2.0607 | -2.2454 |
150
+ | 0.0473 | 0.57 | 8700 | 0.0247 | -0.0695 | -0.1535 | 0.6565 | 0.0840 | -242.3023 | -245.9043 | -2.0663 | -2.2518 |
151
+ | 0.0302 | 0.58 | 8800 | 0.0249 | -0.0482 | -0.1318 | 0.6610 | 0.0837 | -237.9781 | -241.6350 | -2.0593 | -2.2448 |
152
+ | 0.0391 | 0.58 | 8900 | 0.0248 | -0.0637 | -0.1548 | 0.6620 | 0.0911 | -242.5767 | -244.7529 | -2.0658 | -2.2522 |
153
+ | 0.0377 | 0.59 | 9000 | 0.0246 | -0.0355 | -0.1189 | 0.6575 | 0.0834 | -235.3853 | -239.0974 | -2.0745 | -2.2613 |
154
+ | 0.0296 | 0.6 | 9100 | 0.0249 | -0.0387 | -0.1166 | 0.6550 | 0.0779 | -234.9412 | -239.7537 | -2.0871 | -2.2747 |
155
+ | 0.0241 | 0.6 | 9200 | 0.0252 | -0.0358 | -0.1111 | 0.6575 | 0.0753 | -233.8348 | -239.1661 | -2.1060 | -2.2943 |
156
+ | 0.019 | 0.61 | 9300 | 0.0250 | -0.0516 | -0.1373 | 0.6580 | 0.0858 | -239.0793 | -242.3174 | -2.1004 | -2.2889 |
157
+ | 0.0247 | 0.62 | 9400 | 0.0251 | -0.0712 | -0.1504 | 0.6545 | 0.0792 | -241.6835 | -246.2362 | -2.1041 | -2.2926 |
158
+ | 0.0161 | 0.62 | 9500 | 0.0249 | -0.0518 | -0.1338 | 0.6485 | 0.0820 | -238.3770 | -242.3746 | -2.0949 | -2.2827 |
159
+ | 0.0198 | 0.63 | 9600 | 0.0250 | -0.0282 | -0.1124 | 0.6500 | 0.0842 | -234.0898 | -237.6352 | -2.0913 | -2.2787 |
160
+ | 0.0368 | 0.63 | 9700 | 0.0248 | -0.0568 | -0.1405 | 0.6585 | 0.0836 | -239.7049 | -243.3711 | -2.0914 | -2.2787 |
161
+ | 0.0214 | 0.64 | 9800 | 0.0248 | -0.0559 | -0.1371 | 0.6570 | 0.0811 | -239.0298 | -243.1945 | -2.0971 | -2.2844 |
162
+ | 0.0331 | 0.65 | 9900 | 0.0246 | -0.0441 | -0.1329 | 0.6600 | 0.0888 | -238.1875 | -240.8263 | -2.0867 | -2.2732 |
163
+ | 0.0316 | 0.65 | 10000 | 0.0246 | -0.0573 | -0.1474 | 0.6580 | 0.0901 | -241.0922 | -243.4642 | -2.0770 | -2.2634 |
164
+ | 0.0181 | 0.66 | 10100 | 0.0248 | -0.0757 | -0.1612 | 0.6670 | 0.0855 | -243.8461 | -247.1387 | -2.0801 | -2.2661 |
165
+ | 0.0159 | 0.67 | 10200 | 0.0245 | -0.0638 | -0.1550 | 0.6610 | 0.0912 | -242.6056 | -244.7626 | -2.0611 | -2.2463 |
166
+ | 0.018 | 0.67 | 10300 | 0.0244 | -0.0590 | -0.1447 | 0.6615 | 0.0857 | -240.5506 | -243.8084 | -2.0698 | -2.2554 |
167
+ | 0.0144 | 0.68 | 10400 | 0.0245 | -0.0385 | -0.1258 | 0.6605 | 0.0873 | -236.7707 | -239.7064 | -2.0630 | -2.2489 |
168
+ | 0.0273 | 0.69 | 10500 | 0.0244 | -0.0431 | -0.1273 | 0.6565 | 0.0842 | -237.0745 | -240.6274 | -2.0678 | -2.2537 |
169
+ | 0.0194 | 0.69 | 10600 | 0.0243 | -0.0430 | -0.1273 | 0.6635 | 0.0843 | -237.0673 | -240.6028 | -2.0684 | -2.2543 |
170
+ | 0.0199 | 0.7 | 10700 | 0.0244 | -0.0439 | -0.1259 | 0.6595 | 0.0820 | -236.7907 | -240.7807 | -2.0696 | -2.2556 |
171
+ | 0.0349 | 0.71 | 10800 | 0.0245 | -0.0394 | -0.1225 | 0.6585 | 0.0832 | -236.1209 | -239.8839 | -2.0673 | -2.2533 |
172
+ | 0.0294 | 0.71 | 10900 | 0.0246 | -0.0459 | -0.1264 | 0.6615 | 0.0805 | -236.8899 | -241.1904 | -2.0696 | -2.2554 |
173
+ | 0.0493 | 0.72 | 11000 | 0.0247 | -0.0406 | -0.1196 | 0.6555 | 0.0789 | -235.5289 | -240.1349 | -2.0714 | -2.2571 |
174
+ | 0.0186 | 0.73 | 11100 | 0.0246 | -0.0362 | -0.1179 | 0.6605 | 0.0817 | -235.1986 | -239.2465 | -2.0741 | -2.2600 |
175
+ | 0.0233 | 0.73 | 11200 | 0.0247 | -0.0275 | -0.1085 | 0.6585 | 0.0810 | -233.3055 | -237.5009 | -2.0750 | -2.2610 |
176
+ | 0.0218 | 0.74 | 11300 | 0.0244 | -0.0370 | -0.1200 | 0.6575 | 0.0831 | -235.6197 | -239.4001 | -2.0764 | -2.2629 |
177
+ | 0.0365 | 0.75 | 11400 | 0.0245 | -0.0355 | -0.1223 | 0.6580 | 0.0868 | -236.0721 | -239.1116 | -2.0719 | -2.2584 |
178
+ | 0.0199 | 0.75 | 11500 | 0.0246 | -0.0318 | -0.1118 | 0.6590 | 0.0800 | -233.9702 | -238.3574 | -2.0827 | -2.2695 |
179
+ | 0.0296 | 0.76 | 11600 | 0.0244 | -0.0421 | -0.1299 | 0.6665 | 0.0878 | -237.5938 | -240.4171 | -2.0765 | -2.2633 |
180
+ | 0.015 | 0.77 | 11700 | 0.0244 | -0.0487 | -0.1316 | 0.6600 | 0.0829 | -237.9386 | -241.7478 | -2.0779 | -2.2644 |
181
+ | 0.0127 | 0.77 | 11800 | 0.0244 | -0.0598 | -0.1424 | 0.6580 | 0.0826 | -240.0971 | -243.9701 | -2.0787 | -2.2653 |
182
+ | 0.0199 | 0.78 | 11900 | 0.0243 | -0.0591 | -0.1450 | 0.6605 | 0.0859 | -240.6168 | -243.8326 | -2.0758 | -2.2626 |
183
+ | 0.0313 | 0.79 | 12000 | 0.0244 | -0.0559 | -0.1424 | 0.6605 | 0.0865 | -240.0914 | -243.1773 | -2.0797 | -2.2669 |
184
+ | 0.0102 | 0.79 | 12100 | 0.0244 | -0.0513 | -0.1355 | 0.6560 | 0.0842 | -238.7046 | -242.2641 | -2.0830 | -2.2705 |
185
+ | 0.0325 | 0.8 | 12200 | 0.0243 | -0.0456 | -0.1291 | 0.6600 | 0.0835 | -237.4338 | -241.1291 | -2.0835 | -2.2709 |
186
+ | 0.028 | 0.8 | 12300 | 0.0243 | -0.0493 | -0.1364 | 0.6585 | 0.0872 | -238.8947 | -241.8556 | -2.0821 | -2.2695 |
187
+ | 0.0278 | 0.81 | 12400 | 0.0241 | -0.0510 | -0.1343 | 0.6600 | 0.0833 | -238.4753 | -242.2064 | -2.0913 | -2.2793 |
188
+ | 0.0142 | 0.82 | 12500 | 0.0241 | -0.0540 | -0.1371 | 0.6570 | 0.0831 | -239.0412 | -242.8141 | -2.0913 | -2.2793 |
189
+ | 0.0177 | 0.82 | 12600 | 0.0242 | -0.0556 | -0.1379 | 0.6580 | 0.0823 | -239.1902 | -243.1229 | -2.0917 | -2.2797 |
190
+ | 0.0133 | 0.83 | 12700 | 0.0242 | -0.0496 | -0.1314 | 0.6575 | 0.0819 | -237.8956 | -241.9153 | -2.0933 | -2.2814 |
191
+ | 0.0186 | 0.84 | 12800 | 0.0242 | -0.0451 | -0.1272 | 0.6565 | 0.0822 | -237.0618 | -241.0176 | -2.0936 | -2.2818 |
192
+ | 0.0117 | 0.84 | 12900 | 0.0241 | -0.0397 | -0.1232 | 0.6580 | 0.0835 | -236.2435 | -239.9395 | -2.0908 | -2.2790 |
193
+ | 0.0116 | 0.85 | 13000 | 0.0241 | -0.0419 | -0.1272 | 0.6580 | 0.0853 | -237.0613 | -240.3864 | -2.0899 | -2.2781 |
194
+ | 0.0338 | 0.86 | 13100 | 0.0241 | -0.0404 | -0.1232 | 0.6565 | 0.0828 | -236.2545 | -240.0884 | -2.0941 | -2.2824 |
195
+ | 0.0206 | 0.86 | 13200 | 0.0240 | -0.0429 | -0.1280 | 0.6590 | 0.0851 | -237.2177 | -240.5875 | -2.0892 | -2.2772 |
196
+ | 0.018 | 0.87 | 13300 | 0.0240 | -0.0407 | -0.1257 | 0.6600 | 0.0851 | -236.7596 | -240.1422 | -2.0891 | -2.2772 |
197
+ | 0.0275 | 0.88 | 13400 | 0.0240 | -0.0392 | -0.1234 | 0.6585 | 0.0842 | -236.2926 | -239.8449 | -2.0904 | -2.2786 |
198
+ | 0.0177 | 0.88 | 13500 | 0.0240 | -0.0369 | -0.1201 | 0.6580 | 0.0832 | -235.6223 | -239.3825 | -2.0911 | -2.2792 |
199
+ | 0.0225 | 0.89 | 13600 | 0.0240 | -0.0405 | -0.1255 | 0.6580 | 0.0850 | -236.7200 | -240.1148 | -2.0913 | -2.2794 |
200
+ | 0.0223 | 0.9 | 13700 | 0.0240 | -0.0422 | -0.1268 | 0.6595 | 0.0846 | -236.9746 | -240.4513 | -2.0923 | -2.2803 |
201
+ | 0.0302 | 0.9 | 13800 | 0.0240 | -0.0416 | -0.1272 | 0.6575 | 0.0857 | -237.0577 | -240.3201 | -2.0900 | -2.2780 |
202
+ | 0.0213 | 0.91 | 13900 | 0.0239 | -0.0407 | -0.1267 | 0.6605 | 0.0859 | -236.9426 | -240.1542 | -2.0888 | -2.2767 |
203
+ | 0.0221 | 0.92 | 14000 | 0.0239 | -0.0425 | -0.1287 | 0.6595 | 0.0862 | -237.3506 | -240.4969 | -2.0892 | -2.2772 |
204
+ | 0.0259 | 0.92 | 14100 | 0.0239 | -0.0411 | -0.1266 | 0.6585 | 0.0855 | -236.9374 | -240.2254 | -2.0892 | -2.2772 |
205
+ | 0.0156 | 0.93 | 14200 | 0.0239 | -0.0419 | -0.1278 | 0.6615 | 0.0859 | -237.1707 | -240.3793 | -2.0891 | -2.2771 |
206
+ | 0.0158 | 0.94 | 14300 | 0.0239 | -0.0414 | -0.1269 | 0.6600 | 0.0855 | -237.0012 | -240.2890 | -2.0887 | -2.2765 |
207
+ | 0.0216 | 0.94 | 14400 | 0.0239 | -0.0413 | -0.1268 | 0.6620 | 0.0856 | -236.9817 | -240.2556 | -2.0895 | -2.2774 |
208
+ | 0.0126 | 0.95 | 14500 | 0.0239 | -0.0413 | -0.1269 | 0.6605 | 0.0856 | -237.0005 | -240.2699 | -2.0895 | -2.2774 |
209
+ | 0.0346 | 0.96 | 14600 | 0.0239 | -0.0416 | -0.1269 | 0.6590 | 0.0853 | -236.9897 | -240.3241 | -2.0901 | -2.2781 |
210
+ | 0.0225 | 0.96 | 14700 | 0.0239 | -0.0415 | -0.1267 | 0.6605 | 0.0852 | -236.9473 | -240.3016 | -2.0895 | -2.2774 |
211
+ | 0.0099 | 0.97 | 14800 | 0.0239 | -0.0415 | -0.1268 | 0.6595 | 0.0853 | -236.9750 | -240.3092 | -2.0891 | -2.2771 |
212
+ | 0.0235 | 0.97 | 14900 | 0.0239 | -0.0415 | -0.1268 | 0.6585 | 0.0853 | -236.9760 | -240.2991 | -2.0898 | -2.2777 |
213
+ | 0.019 | 0.98 | 15000 | 0.0239 | -0.0415 | -0.1267 | 0.6610 | 0.0852 | -236.9527 | -240.3060 | -2.0899 | -2.2778 |
214
+ | 0.0368 | 0.99 | 15100 | 0.0239 | -0.0415 | -0.1267 | 0.6605 | 0.0852 | -236.9458 | -240.2961 | -2.0904 | -2.2784 |
215
+ | 0.0267 | 0.99 | 15200 | 0.0239 | -0.0414 | -0.1265 | 0.6580 | 0.0851 | -236.9213 | -240.2912 | -2.0899 | -2.2778 |
216
+
217
+
218
+ ### Framework versions
219
+
220
+ - PEFT 0.7.1
221
+ - Transformers 4.36.2
222
+ - Pytorch 2.1.2+cu121
223
+ - Datasets 2.14.6
224
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:864bd21bb3fd8fcc933a9254ecbbca2b3884bb6c1974afa48389226473ba3d8f
3
  size 671150064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:803436858fdd215bfc5ffeda4bad0800b6ba7dc4a46d48fc883d0244ac87c32d
3
  size 671150064
all_results.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_logits/chosen": -2.2779674530029297,
4
+ "eval_logits/rejected": -2.0900180339813232,
5
+ "eval_logps/chosen": -240.2975311279297,
6
+ "eval_logps/rejected": -236.9312744140625,
7
+ "eval_loss": 0.023918792605400085,
8
+ "eval_rewards/accuracies": 0.6600000262260437,
9
+ "eval_rewards/chosen": -0.04146282374858856,
10
+ "eval_rewards/margins": 0.0851340964436531,
11
+ "eval_rewards/rejected": -0.12659691274166107,
12
+ "eval_runtime": 711.555,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 2.811,
15
+ "eval_steps_per_second": 1.405,
16
+ "train_loss": 0.028345070170466266,
17
+ "train_runtime": 172193.3354,
18
+ "train_samples": 61135,
19
+ "train_samples_per_second": 0.355,
20
+ "train_steps_per_second": 0.089
21
+ }
eval_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_logits/chosen": -2.2779674530029297,
4
+ "eval_logits/rejected": -2.0900180339813232,
5
+ "eval_logps/chosen": -240.2975311279297,
6
+ "eval_logps/rejected": -236.9312744140625,
7
+ "eval_loss": 0.023918792605400085,
8
+ "eval_rewards/accuracies": 0.6600000262260437,
9
+ "eval_rewards/chosen": -0.04146282374858856,
10
+ "eval_rewards/margins": 0.0851340964436531,
11
+ "eval_rewards/rejected": -0.12659691274166107,
12
+ "eval_runtime": 711.555,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 2.811,
15
+ "eval_steps_per_second": 1.405
16
+ }
runs/Apr05_02-26-58_gpu4-119-5/events.out.tfevents.1712244483.gpu4-119-5.3718988.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a59fc69b63fb18bdeca587a8a9cbf8d956925ed69e863f7253d888d89c18a1d2
3
- size 1081225
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e87929c3afc7eda8a459d726b2926f5420d0e3513b4f4125c3ce62768aeccbe7
3
+ size 1086651
runs/Apr05_02-26-58_gpu4-119-5/events.out.tfevents.1712417388.gpu4-119-5.3718988.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d6f951760ebc9ed0742740a288949255d9df34bf60a3007b0b576871541d7da3
3
+ size 828
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.028345070170466266,
4
+ "train_runtime": 172193.3354,
5
+ "train_samples": 61135,
6
+ "train_samples_per_second": 0.355,
7
+ "train_steps_per_second": 0.089
8
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff