lole25 commited on
Commit
3bca2ae
1 Parent(s): 784960f

Model save

Browse files
README.md ADDED
@@ -0,0 +1,223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ tags:
4
+ - trl
5
+ - dpo
6
+ - generated_from_trainer
7
+ base_model: mistralai/Mistral-7B-v0.1
8
+ model-index:
9
+ - name: zephyr-7b-gpo-log1-i0
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # zephyr-7b-gpo-log1-i0
17
+
18
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.6897
21
+ - Rewards/chosen: 0.0141
22
+ - Rewards/rejected: -0.0702
23
+ - Rewards/accuracies: 0.6370
24
+ - Rewards/margins: 0.0842
25
+ - Logps/rejected: -218.6293
26
+ - Logps/chosen: -230.5992
27
+ - Logits/rejected: -2.1363
28
+ - Logits/chosen: -2.3248
29
+
30
+ ## Model description
31
+
32
+ More information needed
33
+
34
+ ## Intended uses & limitations
35
+
36
+ More information needed
37
+
38
+ ## Training and evaluation data
39
+
40
+ More information needed
41
+
42
+ ## Training procedure
43
+
44
+ ### Training hyperparameters
45
+
46
+ The following hyperparameters were used during training:
47
+ - learning_rate: 5e-06
48
+ - train_batch_size: 2
49
+ - eval_batch_size: 2
50
+ - seed: 42
51
+ - distributed_type: multi-GPU
52
+ - gradient_accumulation_steps: 2
53
+ - total_train_batch_size: 4
54
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
+ - lr_scheduler_type: cosine
56
+ - lr_scheduler_warmup_ratio: 0.1
57
+ - num_epochs: 1
58
+
59
+ ### Training results
60
+
61
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
+ |:-------------:|:-----:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.6932 | 0.01 | 100 | 0.6931 | 0.0024 | 0.0017 | 0.4950 | 0.0007 | -211.4439 | -231.7646 | -2.1604 | -2.3488 |
64
+ | 0.6927 | 0.01 | 200 | 0.6928 | 0.0053 | -0.0004 | 0.5835 | 0.0057 | -211.6526 | -231.4798 | -2.1608 | -2.3492 |
65
+ | 0.6917 | 0.02 | 300 | 0.6925 | 0.0335 | 0.0177 | 0.5830 | 0.0159 | -209.8460 | -228.6509 | -2.1649 | -2.3535 |
66
+ | 0.6916 | 0.03 | 400 | 0.6920 | 0.0466 | 0.0223 | 0.6020 | 0.0244 | -209.3866 | -227.3408 | -2.1660 | -2.3548 |
67
+ | 0.6917 | 0.03 | 500 | 0.6916 | 0.0638 | 0.0219 | 0.6060 | 0.0419 | -209.4261 | -225.6272 | -2.1616 | -2.3499 |
68
+ | 0.6919 | 0.04 | 600 | 0.6913 | 0.0498 | 0.0026 | 0.5970 | 0.0472 | -211.3561 | -227.0246 | -2.1675 | -2.3568 |
69
+ | 0.6909 | 0.05 | 700 | 0.6913 | 0.0561 | 0.0106 | 0.6145 | 0.0455 | -210.5544 | -226.3928 | -2.1615 | -2.3501 |
70
+ | 0.6913 | 0.05 | 800 | 0.6913 | -0.1047 | -0.1559 | 0.5970 | 0.0512 | -227.2016 | -242.4708 | -2.1428 | -2.3307 |
71
+ | 0.6921 | 0.06 | 900 | 0.6909 | -0.0526 | -0.1012 | 0.6060 | 0.0486 | -221.7336 | -237.2677 | -2.1466 | -2.3343 |
72
+ | 0.6903 | 0.07 | 1000 | 0.6908 | -0.0008 | -0.0563 | 0.6185 | 0.0555 | -217.2371 | -232.0825 | -2.1575 | -2.3453 |
73
+ | 0.6922 | 0.07 | 1100 | 0.6911 | -0.0015 | -0.0779 | 0.6275 | 0.0764 | -219.4024 | -232.1565 | -2.1294 | -2.3151 |
74
+ | 0.6906 | 0.08 | 1200 | 0.6907 | -0.0276 | -0.0979 | 0.6375 | 0.0703 | -221.4021 | -234.7645 | -2.1398 | -2.3272 |
75
+ | 0.6886 | 0.09 | 1300 | 0.6907 | 0.0146 | -0.0428 | 0.6105 | 0.0574 | -215.8946 | -230.5475 | -2.1613 | -2.3501 |
76
+ | 0.6887 | 0.09 | 1400 | 0.6909 | 0.0072 | -0.0587 | 0.6130 | 0.0660 | -217.4851 | -231.2815 | -2.1350 | -2.3205 |
77
+ | 0.6887 | 0.1 | 1500 | 0.6907 | -0.0114 | -0.0845 | 0.6305 | 0.0731 | -220.0597 | -233.1405 | -2.1365 | -2.3217 |
78
+ | 0.6904 | 0.1 | 1600 | 0.6906 | 0.0443 | -0.0289 | 0.6260 | 0.0732 | -214.5052 | -227.5776 | -2.1414 | -2.3270 |
79
+ | 0.6893 | 0.11 | 1700 | 0.6904 | 0.0333 | -0.0409 | 0.6215 | 0.0742 | -215.7022 | -228.6733 | -2.1548 | -2.3421 |
80
+ | 0.6904 | 0.12 | 1800 | 0.6909 | 0.0409 | -0.0143 | 0.6160 | 0.0552 | -213.0369 | -227.9110 | -2.1467 | -2.3331 |
81
+ | 0.6908 | 0.12 | 1900 | 0.6906 | 0.0455 | -0.0171 | 0.6290 | 0.0626 | -213.3265 | -227.4577 | -2.1587 | -2.3461 |
82
+ | 0.6907 | 0.13 | 2000 | 0.6904 | -0.0093 | -0.0898 | 0.6400 | 0.0805 | -220.5949 | -232.9343 | -2.1672 | -2.3558 |
83
+ | 0.6904 | 0.14 | 2100 | 0.6905 | 0.0245 | -0.0431 | 0.6380 | 0.0676 | -215.9218 | -229.5578 | -2.1837 | -2.3738 |
84
+ | 0.6916 | 0.14 | 2200 | 0.6904 | -0.0211 | -0.1023 | 0.6260 | 0.0812 | -221.8438 | -234.1163 | -2.1669 | -2.3566 |
85
+ | 0.6913 | 0.15 | 2300 | 0.6907 | -0.0303 | -0.1156 | 0.6170 | 0.0852 | -223.1697 | -235.0393 | -2.1698 | -2.3594 |
86
+ | 0.6899 | 0.16 | 2400 | 0.6904 | 0.0312 | -0.0385 | 0.6225 | 0.0697 | -215.4613 | -228.8855 | -2.1472 | -2.3345 |
87
+ | 0.6924 | 0.16 | 2500 | 0.6905 | 0.0577 | -0.0074 | 0.625 | 0.0651 | -212.3521 | -226.2342 | -2.1658 | -2.3546 |
88
+ | 0.6893 | 0.17 | 2600 | 0.6903 | 0.0520 | -0.0205 | 0.6320 | 0.0725 | -213.6627 | -226.8027 | -2.1570 | -2.3453 |
89
+ | 0.6901 | 0.18 | 2700 | 0.6906 | 0.0038 | -0.0622 | 0.6325 | 0.0660 | -217.8366 | -231.6274 | -2.1382 | -2.3249 |
90
+ | 0.6909 | 0.18 | 2800 | 0.6903 | 0.0333 | -0.0363 | 0.6315 | 0.0696 | -215.2451 | -228.6795 | -2.1165 | -2.3020 |
91
+ | 0.6893 | 0.19 | 2900 | 0.6902 | 0.0110 | -0.0612 | 0.6380 | 0.0722 | -217.7327 | -230.9010 | -2.1110 | -2.2960 |
92
+ | 0.6925 | 0.2 | 3000 | 0.6903 | 0.0154 | -0.0656 | 0.6245 | 0.0811 | -218.1745 | -230.4610 | -2.1312 | -2.3182 |
93
+ | 0.692 | 0.2 | 3100 | 0.6903 | -0.0346 | -0.1194 | 0.6440 | 0.0849 | -223.5567 | -235.4630 | -2.1298 | -2.3160 |
94
+ | 0.687 | 0.21 | 3200 | 0.6903 | -0.0146 | -0.0904 | 0.6210 | 0.0757 | -220.6501 | -233.4682 | -2.1344 | -2.3212 |
95
+ | 0.6908 | 0.22 | 3300 | 0.6902 | -0.0061 | -0.0903 | 0.6420 | 0.0842 | -220.6434 | -232.6119 | -2.1233 | -2.3094 |
96
+ | 0.6908 | 0.22 | 3400 | 0.6904 | -0.0103 | -0.0884 | 0.6345 | 0.0781 | -220.4491 | -233.0300 | -2.1210 | -2.3068 |
97
+ | 0.6901 | 0.23 | 3500 | 0.6903 | 0.0193 | -0.0626 | 0.6355 | 0.0819 | -217.8700 | -230.0756 | -2.1193 | -2.3047 |
98
+ | 0.6913 | 0.24 | 3600 | 0.6902 | 0.0148 | -0.0690 | 0.6360 | 0.0838 | -218.5164 | -230.5288 | -2.1189 | -2.3041 |
99
+ | 0.694 | 0.24 | 3700 | 0.6904 | -0.0287 | -0.1025 | 0.6390 | 0.0738 | -221.8667 | -234.8788 | -2.0983 | -2.2820 |
100
+ | 0.6891 | 0.25 | 3800 | 0.6902 | 0.0450 | -0.0237 | 0.6320 | 0.0687 | -213.9806 | -227.5013 | -2.0923 | -2.2758 |
101
+ | 0.6877 | 0.26 | 3900 | 0.6902 | 0.0220 | -0.0570 | 0.6245 | 0.0791 | -217.3152 | -229.8009 | -2.1089 | -2.2936 |
102
+ | 0.6884 | 0.26 | 4000 | 0.6901 | -0.0013 | -0.0808 | 0.6360 | 0.0795 | -219.6905 | -232.1315 | -2.1064 | -2.2913 |
103
+ | 0.693 | 0.27 | 4100 | 0.6904 | -0.0133 | -0.0759 | 0.6280 | 0.0626 | -219.1985 | -233.3333 | -2.1177 | -2.3035 |
104
+ | 0.691 | 0.27 | 4200 | 0.6904 | -0.0025 | -0.0715 | 0.6360 | 0.0690 | -218.7613 | -232.2541 | -2.1112 | -2.2963 |
105
+ | 0.6904 | 0.28 | 4300 | 0.6901 | -0.0338 | -0.1195 | 0.6345 | 0.0858 | -223.5635 | -235.3810 | -2.1015 | -2.2866 |
106
+ | 0.6903 | 0.29 | 4400 | 0.6902 | -0.0454 | -0.1194 | 0.6275 | 0.0740 | -223.5494 | -236.5452 | -2.1077 | -2.2929 |
107
+ | 0.6864 | 0.29 | 4500 | 0.6901 | -0.0231 | -0.1063 | 0.6325 | 0.0833 | -222.2449 | -234.3118 | -2.1211 | -2.3074 |
108
+ | 0.6904 | 0.3 | 4600 | 0.6902 | 0.0062 | -0.0640 | 0.6310 | 0.0702 | -218.0117 | -231.3809 | -2.1215 | -2.3078 |
109
+ | 0.6854 | 0.31 | 4700 | 0.6903 | -0.0355 | -0.1276 | 0.6355 | 0.0921 | -224.3721 | -235.5581 | -2.1311 | -2.3193 |
110
+ | 0.6918 | 0.31 | 4800 | 0.6902 | -0.0179 | -0.0916 | 0.6385 | 0.0737 | -220.7675 | -233.7953 | -2.1200 | -2.3064 |
111
+ | 0.6886 | 0.32 | 4900 | 0.6902 | -0.0208 | -0.1097 | 0.6425 | 0.0889 | -222.5813 | -234.0859 | -2.0991 | -2.2843 |
112
+ | 0.6923 | 0.33 | 5000 | 0.6901 | -0.0066 | -0.0881 | 0.6270 | 0.0815 | -220.4222 | -232.6694 | -2.1010 | -2.2864 |
113
+ | 0.6914 | 0.33 | 5100 | 0.6902 | -0.0049 | -0.0898 | 0.6365 | 0.0849 | -220.5913 | -232.4988 | -2.1187 | -2.3049 |
114
+ | 0.6895 | 0.34 | 5200 | 0.6902 | -0.0224 | -0.0983 | 0.6295 | 0.0759 | -221.4422 | -234.2488 | -2.1360 | -2.3237 |
115
+ | 0.6928 | 0.35 | 5300 | 0.6903 | -0.0338 | -0.1157 | 0.6300 | 0.0819 | -223.1770 | -235.3836 | -2.1243 | -2.3110 |
116
+ | 0.689 | 0.35 | 5400 | 0.6902 | 0.0233 | -0.0513 | 0.6335 | 0.0746 | -216.7387 | -229.6749 | -2.1113 | -2.2966 |
117
+ | 0.6884 | 0.36 | 5500 | 0.6904 | -0.0049 | -0.0776 | 0.6230 | 0.0727 | -219.3675 | -232.4934 | -2.1054 | -2.2905 |
118
+ | 0.6901 | 0.37 | 5600 | 0.6903 | -0.0024 | -0.0762 | 0.6340 | 0.0738 | -219.2327 | -232.2428 | -2.1021 | -2.2871 |
119
+ | 0.6906 | 0.37 | 5700 | 0.6901 | 0.0148 | -0.0702 | 0.6345 | 0.0849 | -218.6294 | -230.5282 | -2.0973 | -2.2823 |
120
+ | 0.69 | 0.38 | 5800 | 0.6902 | -0.0196 | -0.1110 | 0.6365 | 0.0914 | -222.7126 | -233.9667 | -2.1048 | -2.2907 |
121
+ | 0.6907 | 0.39 | 5900 | 0.6901 | 0.0021 | -0.0814 | 0.6385 | 0.0835 | -219.7548 | -231.7942 | -2.0946 | -2.2797 |
122
+ | 0.6901 | 0.39 | 6000 | 0.6901 | 0.0056 | -0.0656 | 0.6295 | 0.0713 | -218.1741 | -231.4416 | -2.1236 | -2.3110 |
123
+ | 0.6889 | 0.4 | 6100 | 0.6901 | 0.0339 | -0.0376 | 0.6215 | 0.0716 | -215.3745 | -228.6116 | -2.1316 | -2.3196 |
124
+ | 0.691 | 0.41 | 6200 | 0.6900 | 0.0231 | -0.0575 | 0.6285 | 0.0806 | -217.3578 | -229.6931 | -2.1264 | -2.3146 |
125
+ | 0.6871 | 0.41 | 6300 | 0.6900 | 0.0432 | -0.0379 | 0.6370 | 0.0810 | -215.3970 | -227.6890 | -2.1200 | -2.3069 |
126
+ | 0.6892 | 0.42 | 6400 | 0.6901 | 0.0295 | -0.0619 | 0.6310 | 0.0914 | -217.7995 | -229.0562 | -2.1320 | -2.3205 |
127
+ | 0.6918 | 0.43 | 6500 | 0.6901 | 0.0240 | -0.0559 | 0.6370 | 0.0799 | -217.2022 | -229.6073 | -2.1407 | -2.3293 |
128
+ | 0.6899 | 0.43 | 6600 | 0.6901 | 0.0346 | -0.0427 | 0.6355 | 0.0773 | -215.8845 | -228.5490 | -2.1480 | -2.3373 |
129
+ | 0.6914 | 0.44 | 6700 | 0.6901 | 0.0006 | -0.0896 | 0.6385 | 0.0902 | -220.5701 | -231.9431 | -2.1399 | -2.3289 |
130
+ | 0.6921 | 0.44 | 6800 | 0.6900 | -0.0122 | -0.0949 | 0.6345 | 0.0826 | -221.0977 | -233.2272 | -2.1373 | -2.3262 |
131
+ | 0.6881 | 0.45 | 6900 | 0.6900 | 0.0001 | -0.0807 | 0.6310 | 0.0808 | -219.6810 | -231.9954 | -2.1336 | -2.3221 |
132
+ | 0.688 | 0.46 | 7000 | 0.6900 | -0.0035 | -0.0895 | 0.6255 | 0.0860 | -220.5654 | -232.3555 | -2.1330 | -2.3214 |
133
+ | 0.6893 | 0.46 | 7100 | 0.6900 | 0.0038 | -0.0786 | 0.6310 | 0.0824 | -219.4742 | -231.6270 | -2.1255 | -2.3129 |
134
+ | 0.6888 | 0.47 | 7200 | 0.6900 | 0.0146 | -0.0599 | 0.6220 | 0.0745 | -217.6021 | -230.5473 | -2.1376 | -2.3262 |
135
+ | 0.6907 | 0.48 | 7300 | 0.6899 | -0.0074 | -0.0859 | 0.6290 | 0.0785 | -220.2062 | -232.7456 | -2.1270 | -2.3148 |
136
+ | 0.6931 | 0.48 | 7400 | 0.6900 | 0.0088 | -0.0681 | 0.6285 | 0.0770 | -218.4249 | -231.1209 | -2.1238 | -2.3113 |
137
+ | 0.6895 | 0.49 | 7500 | 0.6899 | 0.0001 | -0.0788 | 0.6280 | 0.0789 | -219.4958 | -231.9997 | -2.1007 | -2.2861 |
138
+ | 0.6874 | 0.5 | 7600 | 0.6900 | -0.0044 | -0.0909 | 0.6300 | 0.0865 | -220.7033 | -232.4485 | -2.1033 | -2.2888 |
139
+ | 0.6898 | 0.5 | 7700 | 0.6899 | 0.0018 | -0.0817 | 0.6355 | 0.0835 | -219.7780 | -231.8252 | -2.0977 | -2.2827 |
140
+ | 0.6885 | 0.51 | 7800 | 0.6900 | -0.0331 | -0.1186 | 0.6485 | 0.0855 | -223.4754 | -235.3170 | -2.0865 | -2.2713 |
141
+ | 0.6905 | 0.52 | 7900 | 0.6899 | -0.0476 | -0.1257 | 0.6425 | 0.0781 | -224.1827 | -236.7635 | -2.0852 | -2.2699 |
142
+ | 0.6911 | 0.52 | 8000 | 0.6899 | -0.0329 | -0.1140 | 0.6345 | 0.0811 | -223.0114 | -235.2987 | -2.0814 | -2.2658 |
143
+ | 0.6915 | 0.53 | 8100 | 0.6899 | -0.0158 | -0.0964 | 0.6365 | 0.0807 | -221.2535 | -233.5811 | -2.0877 | -2.2729 |
144
+ | 0.6907 | 0.54 | 8200 | 0.6899 | -0.0250 | -0.1063 | 0.6355 | 0.0814 | -222.2466 | -234.5026 | -2.0843 | -2.2691 |
145
+ | 0.6893 | 0.54 | 8300 | 0.6900 | -0.0020 | -0.0780 | 0.6345 | 0.0760 | -219.4079 | -232.2015 | -2.0923 | -2.2778 |
146
+ | 0.6904 | 0.55 | 8400 | 0.6900 | 0.0123 | -0.0553 | 0.6295 | 0.0676 | -217.1386 | -230.7717 | -2.0953 | -2.2805 |
147
+ | 0.6885 | 0.56 | 8500 | 0.6898 | 0.0006 | -0.0852 | 0.6455 | 0.0858 | -220.1317 | -231.9455 | -2.0963 | -2.2819 |
148
+ | 0.6889 | 0.56 | 8600 | 0.6898 | -0.0030 | -0.0879 | 0.6410 | 0.0849 | -220.4034 | -232.3074 | -2.1033 | -2.2895 |
149
+ | 0.6895 | 0.57 | 8700 | 0.6898 | 0.0116 | -0.0737 | 0.6430 | 0.0853 | -218.9868 | -230.8494 | -2.1105 | -2.2970 |
150
+ | 0.6913 | 0.58 | 8800 | 0.6898 | 0.0296 | -0.0519 | 0.6465 | 0.0816 | -216.8063 | -229.0427 | -2.1172 | -2.3044 |
151
+ | 0.6906 | 0.58 | 8900 | 0.6898 | 0.0039 | -0.0875 | 0.6485 | 0.0914 | -220.3614 | -231.6156 | -2.1173 | -2.3050 |
152
+ | 0.6888 | 0.59 | 9000 | 0.6898 | 0.0111 | -0.0739 | 0.6400 | 0.0851 | -219.0050 | -230.8923 | -2.1196 | -2.3073 |
153
+ | 0.6905 | 0.6 | 9100 | 0.6899 | 0.0201 | -0.0529 | 0.6325 | 0.0730 | -216.9018 | -229.9912 | -2.1251 | -2.3129 |
154
+ | 0.6887 | 0.6 | 9200 | 0.6898 | 0.0207 | -0.0583 | 0.6355 | 0.0790 | -217.4442 | -229.9347 | -2.1397 | -2.3283 |
155
+ | 0.6899 | 0.61 | 9300 | 0.6898 | 0.0062 | -0.0796 | 0.6375 | 0.0858 | -219.5693 | -231.3830 | -2.1441 | -2.3333 |
156
+ | 0.6884 | 0.62 | 9400 | 0.6899 | -0.0285 | -0.1089 | 0.6335 | 0.0804 | -222.5007 | -234.8580 | -2.1432 | -2.3321 |
157
+ | 0.6871 | 0.62 | 9500 | 0.6898 | -0.0095 | -0.0917 | 0.6365 | 0.0822 | -220.7840 | -232.9599 | -2.1435 | -2.3324 |
158
+ | 0.6905 | 0.63 | 9600 | 0.6899 | 0.0203 | -0.0661 | 0.6385 | 0.0864 | -218.2251 | -229.9762 | -2.1520 | -2.3417 |
159
+ | 0.6895 | 0.63 | 9700 | 0.6898 | 0.0048 | -0.0783 | 0.6440 | 0.0831 | -219.4395 | -231.5201 | -2.1527 | -2.3423 |
160
+ | 0.6915 | 0.64 | 9800 | 0.6898 | -0.0028 | -0.0828 | 0.6420 | 0.0800 | -219.8873 | -232.2814 | -2.1416 | -2.3302 |
161
+ | 0.6894 | 0.65 | 9900 | 0.6898 | -0.0006 | -0.0874 | 0.6435 | 0.0867 | -220.3488 | -232.0690 | -2.1391 | -2.3274 |
162
+ | 0.6897 | 0.65 | 10000 | 0.6899 | -0.0191 | -0.1066 | 0.6475 | 0.0875 | -222.2716 | -233.9115 | -2.1345 | -2.3227 |
163
+ | 0.6859 | 0.66 | 10100 | 0.6899 | -0.0225 | -0.1068 | 0.6475 | 0.0843 | -222.2938 | -234.2563 | -2.1291 | -2.3167 |
164
+ | 0.6904 | 0.67 | 10200 | 0.6898 | 0.0002 | -0.0901 | 0.6475 | 0.0903 | -220.6184 | -231.9806 | -2.1274 | -2.3151 |
165
+ | 0.6876 | 0.67 | 10300 | 0.6898 | 0.0014 | -0.0829 | 0.6435 | 0.0843 | -219.8981 | -231.8635 | -2.1301 | -2.3181 |
166
+ | 0.6888 | 0.68 | 10400 | 0.6898 | 0.0178 | -0.0690 | 0.6385 | 0.0868 | -218.5098 | -230.2225 | -2.1290 | -2.3170 |
167
+ | 0.6893 | 0.69 | 10500 | 0.6898 | 0.0209 | -0.0629 | 0.6395 | 0.0838 | -217.9021 | -229.9178 | -2.1322 | -2.3205 |
168
+ | 0.6893 | 0.69 | 10600 | 0.6898 | 0.0157 | -0.0686 | 0.6430 | 0.0844 | -218.4735 | -230.4310 | -2.1292 | -2.3171 |
169
+ | 0.6907 | 0.7 | 10700 | 0.6898 | 0.0165 | -0.0682 | 0.6430 | 0.0847 | -218.4280 | -230.3552 | -2.1293 | -2.3170 |
170
+ | 0.6877 | 0.71 | 10800 | 0.6898 | 0.0264 | -0.0554 | 0.6435 | 0.0818 | -217.1490 | -229.3606 | -2.1293 | -2.3171 |
171
+ | 0.6924 | 0.71 | 10900 | 0.6898 | 0.0120 | -0.0670 | 0.6385 | 0.0790 | -218.3147 | -230.8059 | -2.1238 | -2.3111 |
172
+ | 0.691 | 0.72 | 11000 | 0.6898 | 0.0266 | -0.0537 | 0.6395 | 0.0803 | -216.9807 | -229.3445 | -2.1251 | -2.3125 |
173
+ | 0.6903 | 0.73 | 11100 | 0.6898 | 0.0312 | -0.0491 | 0.6360 | 0.0803 | -216.5214 | -228.8819 | -2.1258 | -2.3132 |
174
+ | 0.6918 | 0.73 | 11200 | 0.6898 | 0.0305 | -0.0499 | 0.6375 | 0.0804 | -216.6021 | -228.9509 | -2.1260 | -2.3134 |
175
+ | 0.6879 | 0.74 | 11300 | 0.6898 | 0.0205 | -0.0612 | 0.6380 | 0.0818 | -217.7365 | -229.9544 | -2.1278 | -2.3155 |
176
+ | 0.6896 | 0.75 | 11400 | 0.6898 | 0.0170 | -0.0694 | 0.6355 | 0.0864 | -218.5536 | -230.3058 | -2.1292 | -2.3172 |
177
+ | 0.6904 | 0.75 | 11500 | 0.6898 | 0.0200 | -0.0610 | 0.6295 | 0.0811 | -217.7165 | -230.0003 | -2.1303 | -2.3183 |
178
+ | 0.6891 | 0.76 | 11600 | 0.6898 | 0.0093 | -0.0783 | 0.6370 | 0.0877 | -219.4468 | -231.0702 | -2.1269 | -2.3147 |
179
+ | 0.6883 | 0.77 | 11700 | 0.6898 | 0.0024 | -0.0805 | 0.6355 | 0.0828 | -219.6586 | -231.7671 | -2.1296 | -2.3175 |
180
+ | 0.69 | 0.77 | 11800 | 0.6898 | -0.0053 | -0.0871 | 0.6410 | 0.0818 | -220.3198 | -232.5302 | -2.1311 | -2.3192 |
181
+ | 0.6871 | 0.78 | 11900 | 0.6898 | -0.0076 | -0.0914 | 0.6410 | 0.0838 | -220.7492 | -232.7632 | -2.1300 | -2.3180 |
182
+ | 0.6887 | 0.79 | 12000 | 0.6898 | -0.0020 | -0.0869 | 0.6420 | 0.0849 | -220.3020 | -232.2003 | -2.1329 | -2.3212 |
183
+ | 0.6881 | 0.79 | 12100 | 0.6898 | 0.0007 | -0.0815 | 0.6385 | 0.0822 | -219.7614 | -231.9368 | -2.1346 | -2.3230 |
184
+ | 0.6905 | 0.8 | 12200 | 0.6898 | 0.0116 | -0.0698 | 0.6340 | 0.0814 | -218.5900 | -230.8437 | -2.1335 | -2.3217 |
185
+ | 0.6915 | 0.8 | 12300 | 0.6898 | 0.0068 | -0.0793 | 0.6365 | 0.0861 | -219.5374 | -231.3238 | -2.1342 | -2.3226 |
186
+ | 0.6927 | 0.81 | 12400 | 0.6898 | 0.0117 | -0.0703 | 0.6350 | 0.0820 | -218.6442 | -230.8355 | -2.1361 | -2.3246 |
187
+ | 0.6897 | 0.82 | 12500 | 0.6898 | 0.0095 | -0.0713 | 0.6325 | 0.0807 | -218.7409 | -231.0591 | -2.1371 | -2.3257 |
188
+ | 0.6905 | 0.82 | 12600 | 0.6898 | 0.0061 | -0.0744 | 0.6365 | 0.0805 | -219.0518 | -231.3977 | -2.1376 | -2.3263 |
189
+ | 0.6905 | 0.83 | 12700 | 0.6898 | 0.0062 | -0.0754 | 0.6335 | 0.0815 | -219.1471 | -231.3857 | -2.1376 | -2.3263 |
190
+ | 0.6907 | 0.84 | 12800 | 0.6898 | 0.0129 | -0.0688 | 0.6360 | 0.0817 | -218.4943 | -230.7170 | -2.1390 | -2.3279 |
191
+ | 0.6911 | 0.84 | 12900 | 0.6897 | 0.0182 | -0.0653 | 0.6335 | 0.0835 | -218.1457 | -230.1887 | -2.1372 | -2.3259 |
192
+ | 0.6886 | 0.85 | 13000 | 0.6897 | 0.0149 | -0.0707 | 0.6365 | 0.0856 | -218.6831 | -230.5150 | -2.1390 | -2.3278 |
193
+ | 0.6914 | 0.86 | 13100 | 0.6897 | 0.0135 | -0.0701 | 0.6355 | 0.0836 | -218.6235 | -230.6533 | -2.1373 | -2.3260 |
194
+ | 0.6887 | 0.86 | 13200 | 0.6897 | 0.0112 | -0.0734 | 0.6370 | 0.0846 | -218.9507 | -230.8813 | -2.1367 | -2.3253 |
195
+ | 0.6891 | 0.87 | 13300 | 0.6897 | 0.0125 | -0.0733 | 0.6405 | 0.0858 | -218.9421 | -230.7573 | -2.1360 | -2.3246 |
196
+ | 0.6913 | 0.88 | 13400 | 0.6897 | 0.0152 | -0.0698 | 0.6305 | 0.0850 | -218.5887 | -230.4858 | -2.1379 | -2.3267 |
197
+ | 0.6912 | 0.88 | 13500 | 0.6897 | 0.0194 | -0.0641 | 0.6360 | 0.0836 | -218.0252 | -230.0619 | -2.1378 | -2.3265 |
198
+ | 0.6905 | 0.89 | 13600 | 0.6897 | 0.0163 | -0.0690 | 0.6380 | 0.0853 | -218.5100 | -230.3711 | -2.1382 | -2.3269 |
199
+ | 0.6913 | 0.9 | 13700 | 0.6897 | 0.0172 | -0.0673 | 0.6360 | 0.0846 | -218.3449 | -230.2803 | -2.1379 | -2.3266 |
200
+ | 0.69 | 0.9 | 13800 | 0.6897 | 0.0175 | -0.0677 | 0.6390 | 0.0851 | -218.3797 | -230.2597 | -2.1379 | -2.3266 |
201
+ | 0.6902 | 0.91 | 13900 | 0.6897 | 0.0181 | -0.0668 | 0.6400 | 0.0849 | -218.2959 | -230.1951 | -2.1371 | -2.3257 |
202
+ | 0.6883 | 0.92 | 14000 | 0.6897 | 0.0142 | -0.0709 | 0.6380 | 0.0851 | -218.7007 | -230.5817 | -2.1376 | -2.3262 |
203
+ | 0.6898 | 0.92 | 14100 | 0.6897 | 0.0158 | -0.0685 | 0.6375 | 0.0844 | -218.4662 | -230.4218 | -2.1366 | -2.3252 |
204
+ | 0.6894 | 0.93 | 14200 | 0.6897 | 0.0149 | -0.0698 | 0.6375 | 0.0847 | -218.5941 | -230.5171 | -2.1369 | -2.3255 |
205
+ | 0.6912 | 0.94 | 14300 | 0.6897 | 0.0145 | -0.0702 | 0.6400 | 0.0847 | -218.6314 | -230.5508 | -2.1365 | -2.3251 |
206
+ | 0.6893 | 0.94 | 14400 | 0.6897 | 0.0139 | -0.0710 | 0.6410 | 0.0848 | -218.7085 | -230.6183 | -2.1361 | -2.3247 |
207
+ | 0.6914 | 0.95 | 14500 | 0.6897 | 0.0139 | -0.0710 | 0.6370 | 0.0848 | -218.7070 | -230.6179 | -2.1364 | -2.3250 |
208
+ | 0.6897 | 0.96 | 14600 | 0.6897 | 0.0138 | -0.0707 | 0.6355 | 0.0844 | -218.6777 | -230.6268 | -2.1363 | -2.3249 |
209
+ | 0.691 | 0.96 | 14700 | 0.6897 | 0.0138 | -0.0705 | 0.6365 | 0.0843 | -218.6600 | -230.6252 | -2.1362 | -2.3248 |
210
+ | 0.6897 | 0.97 | 14800 | 0.6897 | 0.0139 | -0.0705 | 0.6340 | 0.0844 | -218.6653 | -230.6136 | -2.1364 | -2.3250 |
211
+ | 0.6892 | 0.97 | 14900 | 0.6897 | 0.0138 | -0.0703 | 0.6380 | 0.0841 | -218.6449 | -230.6241 | -2.1365 | -2.3250 |
212
+ | 0.6925 | 0.98 | 15000 | 0.6897 | 0.0142 | -0.0701 | 0.6385 | 0.0843 | -218.6228 | -230.5896 | -2.1369 | -2.3255 |
213
+ | 0.6882 | 0.99 | 15100 | 0.6897 | 0.0141 | -0.0701 | 0.6390 | 0.0843 | -218.6257 | -230.5937 | -2.1369 | -2.3255 |
214
+ | 0.6896 | 0.99 | 15200 | 0.6897 | 0.0141 | -0.0701 | 0.6365 | 0.0842 | -218.6245 | -230.5999 | -2.1366 | -2.3251 |
215
+
216
+
217
+ ### Framework versions
218
+
219
+ - PEFT 0.7.1
220
+ - Transformers 4.36.2
221
+ - Pytorch 2.1.2+cu121
222
+ - Datasets 2.14.6
223
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7998b9317f9337e515cb64aaeda5b23d5a11069e600284b9c6cb0baf34c69d9e
3
  size 671150064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1246a4decfb4c9cd08b0e661cb22d7568866bdcefd05e2008ac6145423db1699
3
  size 671150064
all_results.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_logits/chosen": -2.3248252868652344,
4
+ "eval_logits/rejected": -2.1362955570220947,
5
+ "eval_logps/chosen": -230.59922790527344,
6
+ "eval_logps/rejected": -218.62928771972656,
7
+ "eval_loss": 0.6897218227386475,
8
+ "eval_rewards/accuracies": 0.6370000243186951,
9
+ "eval_rewards/chosen": 0.014057200402021408,
10
+ "eval_rewards/margins": 0.08423101156949997,
11
+ "eval_rewards/rejected": -0.07017382234334946,
12
+ "eval_runtime": 710.6006,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 2.815,
15
+ "eval_steps_per_second": 1.407,
16
+ "train_loss": 0.6900739747015976,
17
+ "train_runtime": 171639.7836,
18
+ "train_samples": 61135,
19
+ "train_samples_per_second": 0.356,
20
+ "train_steps_per_second": 0.089
21
+ }
eval_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_logits/chosen": -2.3248252868652344,
4
+ "eval_logits/rejected": -2.1362955570220947,
5
+ "eval_logps/chosen": -230.59922790527344,
6
+ "eval_logps/rejected": -218.62928771972656,
7
+ "eval_loss": 0.6897218227386475,
8
+ "eval_rewards/accuracies": 0.6370000243186951,
9
+ "eval_rewards/chosen": 0.014057200402021408,
10
+ "eval_rewards/margins": 0.08423101156949997,
11
+ "eval_rewards/rejected": -0.07017382234334946,
12
+ "eval_runtime": 710.6006,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 2.815,
15
+ "eval_steps_per_second": 1.407
16
+ }
runs/Apr22_10-32-05_gpu4-119-5/events.out.tfevents.1713745985.gpu4-119-5.2905615.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5f7adfb1c36208150913f7f9da2c446b592fbf51a823d09aae81a4ca6d8d31db
3
- size 1081203
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21fe23abe7f75e2269cbac9671999798cd5df700e0fd3475b77632ab9e58c262
3
+ size 1086629
runs/Apr22_10-32-05_gpu4-119-5/events.out.tfevents.1713918336.gpu4-119-5.2905615.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:55ee921cc068ed932989a942ee8d7272927bfb646b01ec868e7667902165077c
3
+ size 828
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.6900739747015976,
4
+ "train_runtime": 171639.7836,
5
+ "train_samples": 61135,
6
+ "train_samples_per_second": 0.356,
7
+ "train_steps_per_second": 0.089
8
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff