wirthdrew1 commited on
Commit
ab48eda
1 Parent(s): 7ad2c55

Model save

Browse files
README.md ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ base_model: mistralai/Mistral-7B-v0.1
9
+ model-index:
10
+ - name: zephyr-7b-dpo-qlora
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # zephyr-7b-dpo-qlora
18
+
19
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.5036
22
+ - Rewards/chosen: -2.0892
23
+ - Rewards/rejected: -3.1197
24
+ - Rewards/accuracies: 0.7295
25
+ - Rewards/margins: 1.0304
26
+ - Logps/rejected: -560.7722
27
+ - Logps/chosen: -477.4810
28
+ - Logits/rejected: 2.3638
29
+ - Logits/chosen: 1.7891
30
+
31
+ ## Model description
32
+
33
+ More information needed
34
+
35
+ ## Intended uses & limitations
36
+
37
+ More information needed
38
+
39
+ ## Training and evaluation data
40
+
41
+ More information needed
42
+
43
+ ## Training procedure
44
+
45
+ ### Training hyperparameters
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 5e-06
49
+ - train_batch_size: 4
50
+ - eval_batch_size: 8
51
+ - seed: 42
52
+ - distributed_type: multi-GPU
53
+ - gradient_accumulation_steps: 2
54
+ - total_train_batch_size: 8
55
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
+ - lr_scheduler_type: cosine
57
+ - lr_scheduler_warmup_ratio: 0.1
58
+ - num_epochs: 1
59
+
60
+ ### Training results
61
+
62
+ | Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
63
+ |:-------------:|:-----:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:|
64
+ | 0.6931 | 0.01 | 100 | -2.2163 | -2.1335 | -268.5095 | -248.7855 | 0.6930 | 0.5135 | 0.0005 | 0.0003 | 0.0002 |
65
+ | 0.6926 | 0.03 | 200 | -2.2157 | -2.1330 | -268.3331 | -248.7224 | 0.6924 | 0.5885 | 0.0023 | 0.0014 | 0.0008 |
66
+ | 0.6904 | 0.04 | 300 | -2.2194 | -2.1373 | -267.3080 | -248.1708 | 0.6901 | 0.6475 | 0.0125 | 0.0062 | 0.0064 |
67
+ | 0.6868 | 0.05 | 400 | -2.2179 | -2.1356 | -264.7627 | -247.1243 | 0.6830 | 0.6610 | 0.0380 | 0.0211 | 0.0168 |
68
+ | 0.6781 | 0.07 | 500 | -2.1590 | -2.0748 | -266.5388 | -252.3708 | 0.6679 | 0.6785 | 0.0202 | 0.0558 | -0.0356 |
69
+ | 0.6565 | 0.08 | 600 | -2.0685 | -1.9763 | -278.9226 | -272.4421 | 0.6403 | 0.6805 | -0.1036 | 0.1327 | -0.2364 |
70
+ | 0.6411 | 0.09 | 700 | -2.0181 | -1.9197 | -283.8720 | -282.3092 | 0.6254 | 0.6820 | -0.1531 | 0.1819 | -0.3350 |
71
+ | 0.6177 | 0.1 | 800 | -1.9304 | -1.8202 | -307.0186 | -313.1128 | 0.6134 | 0.6765 | -0.3846 | 0.2585 | -0.6431 |
72
+ | 0.6333 | 0.12 | 900 | -1.9660 | -1.8566 | -308.6199 | -317.1526 | 0.6082 | 0.6740 | -0.4006 | 0.2829 | -0.6835 |
73
+ | 0.5776 | 0.13 | 1000 | -2.0038 | -1.8956 | -335.0627 | -351.8794 | 0.6066 | 0.6735 | -0.6650 | 0.3657 | -1.0307 |
74
+ | 0.6093 | 0.14 | 1100 | -2.0022 | -1.9019 | -324.4846 | -341.5230 | 0.6075 | 0.6740 | -0.5592 | 0.3679 | -0.9272 |
75
+ | 0.5607 | 0.16 | 1200 | -1.9182 | -1.8081 | -352.8372 | -375.3466 | 0.5970 | 0.6800 | -0.8428 | 0.4226 | -1.2654 |
76
+ | 0.5627 | 0.17 | 1300 | -1.3203 | -1.1519 | -411.9446 | -433.7877 | 0.5935 | 0.6850 | -1.4339 | 0.4160 | -1.8498 |
77
+ | 0.5853 | 0.18 | 1400 | -1.0520 | -0.8708 | -389.5525 | -417.2325 | 0.5842 | 0.6950 | -1.2099 | 0.4743 | -1.6843 |
78
+ | 0.5622 | 0.2 | 1500 | -0.6561 | -0.4323 | -419.2693 | -453.9020 | 0.5712 | 0.6990 | -1.5071 | 0.5439 | -2.0510 |
79
+ | 0.4815 | 0.21 | 1600 | -0.5810 | -0.3415 | -421.0228 | -464.6043 | 0.5663 | 0.7035 | -1.5246 | 0.6333 | -2.1580 |
80
+ | 0.4698 | 0.22 | 1700 | 0.5697 | -1.8165 | -2.4986 | 0.6990 | 0.6821 | -498.6652 | -450.2103 | 0.5641 | 0.2594 |
81
+ | 0.5213 | 0.24 | 1800 | 0.5670 | -1.4236 | -2.1011 | 0.7055 | 0.6776 | -458.9214 | -410.9152 | 0.6173 | 0.2952 |
82
+ | 0.5295 | 0.25 | 1900 | 0.5606 | -1.9797 | -2.6952 | 0.6945 | 0.7155 | -518.3280 | -466.5294 | 0.8941 | 0.5819 |
83
+ | 0.6074 | 0.26 | 2000 | 0.5525 | -1.1848 | -1.7881 | 0.7165 | 0.6033 | -427.6170 | -387.0396 | 0.3449 | 0.0271 |
84
+ | 0.568 | 0.27 | 2100 | 0.5388 | -1.5667 | -2.2488 | 0.7220 | 0.6822 | -473.6912 | -425.2263 | 1.3497 | 0.9786 |
85
+ | 0.5643 | 0.29 | 2200 | 0.5539 | -1.8112 | -2.6184 | 0.7145 | 0.8072 | -510.6461 | -449.6774 | 1.9603 | 1.5565 |
86
+ | 0.5226 | 0.3 | 2300 | 0.5354 | -1.6020 | -2.3588 | 0.7245 | 0.7568 | -484.6839 | -428.7553 | 1.3673 | 0.9661 |
87
+ | 0.4144 | 0.31 | 2400 | 0.5338 | -2.0110 | -2.8276 | 0.7245 | 0.8167 | -531.5681 | -469.6557 | 1.6609 | 1.2542 |
88
+ | 0.5233 | 0.33 | 2500 | 0.5387 | -1.9001 | -2.7290 | 0.7245 | 0.8289 | -521.7109 | -458.5734 | 1.7390 | 1.3093 |
89
+ | 0.5425 | 0.34 | 2600 | 0.5430 | -2.4619 | -3.3366 | 0.7225 | 0.8747 | -582.4704 | -514.7514 | 2.4431 | 1.9262 |
90
+ | 0.4719 | 0.35 | 2700 | 0.5309 | -1.9122 | -2.7118 | 0.7285 | 0.7996 | -519.9872 | -459.7816 | 2.0586 | 1.6066 |
91
+ | 0.5543 | 0.37 | 2800 | 0.5268 | -1.7066 | -2.4623 | 0.7225 | 0.7557 | -495.0332 | -439.2162 | 1.5924 | 1.1721 |
92
+ | 0.5409 | 0.38 | 2900 | 0.5400 | -2.1879 | -3.1551 | 0.7175 | 0.9673 | -564.3220 | -487.3477 | 2.0890 | 1.6062 |
93
+ | 0.4956 | 0.39 | 3000 | 0.5285 | -1.8388 | -2.7165 | 0.7285 | 0.8777 | -520.4593 | -452.4431 | 1.6464 | 1.1679 |
94
+ | 0.4572 | 0.41 | 3100 | 0.5198 | -1.6639 | -2.4269 | 0.7265 | 0.7630 | -491.4958 | -434.9505 | 1.7627 | 1.2994 |
95
+ | 0.4962 | 0.42 | 3200 | 0.5181 | -1.6914 | -2.5214 | 0.7265 | 0.8300 | -500.9511 | -437.6994 | 1.6452 | 1.1780 |
96
+ | 0.6098 | 0.43 | 3300 | 0.5188 | -1.6044 | -2.4380 | 0.7310 | 0.8336 | -492.6022 | -428.9995 | 1.5141 | 1.0617 |
97
+ | 0.5349 | 0.44 | 3400 | 0.5210 | -1.4720 | -2.3090 | 0.7285 | 0.8370 | -479.7061 | -415.7578 | 1.4965 | 1.0371 |
98
+ | 0.4773 | 0.46 | 3500 | 0.5206 | -1.4425 | -2.2285 | 0.7280 | 0.7861 | -471.6597 | -412.8062 | 1.8090 | 1.3264 |
99
+ | 0.5312 | 0.47 | 3600 | 0.5196 | -1.8128 | -2.6719 | 0.7320 | 0.8591 | -515.9943 | -449.8387 | 2.5339 | 2.0191 |
100
+ | 0.5879 | 0.48 | 3700 | 0.5128 | -1.9225 | -2.7975 | 0.7355 | 0.8750 | -528.5556 | -460.8123 | 2.9390 | 2.3934 |
101
+ | 0.5202 | 0.5 | 3800 | 0.5155 | -1.8291 | -2.7153 | 0.7330 | 0.8863 | -520.3419 | -451.4667 | 2.2728 | 1.7445 |
102
+ | 0.5116 | 0.51 | 3900 | 0.5188 | -2.0732 | -3.0427 | 0.7285 | 0.9696 | -553.0799 | -475.8752 | 2.2721 | 1.7291 |
103
+ | 0.5521 | 0.52 | 4000 | 0.5161 | -2.3283 | -3.3054 | 0.7255 | 0.9771 | -579.3469 | -501.3872 | 2.2577 | 1.7449 |
104
+ | 0.5107 | 0.54 | 4100 | 0.5197 | -1.8192 | -2.7348 | 0.7215 | 0.9156 | -522.2897 | -450.4803 | 1.7678 | 1.2222 |
105
+ | 0.4773 | 0.55 | 4200 | 0.5163 | -2.1894 | -3.1554 | 0.7265 | 0.9660 | -564.3451 | -487.4992 | 1.8497 | 1.3121 |
106
+ | 0.4315 | 0.56 | 4300 | 0.5097 | -2.0873 | -3.0416 | 0.7340 | 0.9544 | -552.9705 | -477.2872 | 2.2039 | 1.6783 |
107
+ | 0.5176 | 0.58 | 4400 | 0.5097 | -2.2486 | -3.2409 | 0.7290 | 0.9924 | -572.8979 | -493.4146 | 2.0782 | 1.5387 |
108
+ | 0.4487 | 0.59 | 4500 | 0.5132 | -2.0257 | -3.0144 | 0.7245 | 0.9887 | -550.2475 | -471.1282 | 2.0676 | 1.4968 |
109
+ | 0.478 | 0.6 | 4600 | 0.5082 | -2.0565 | -3.0343 | 0.7270 | 0.9778 | -552.2376 | -474.2084 | 2.1065 | 1.5402 |
110
+ | 0.5351 | 0.62 | 4700 | 0.5038 | -1.9625 | -2.8993 | 0.7285 | 0.9368 | -538.7390 | -464.8120 | 2.0488 | 1.5017 |
111
+ | 0.4942 | 0.63 | 4800 | 0.5058 | -2.2570 | -3.2479 | 0.7305 | 0.9909 | -573.5954 | -494.2575 | 2.5210 | 1.9471 |
112
+ | 0.4918 | 0.64 | 4900 | 0.5129 | -2.4781 | -3.5322 | 0.7350 | 1.0541 | -602.0275 | -516.3653 | 2.8295 | 2.2468 |
113
+ | 0.4693 | 0.65 | 5000 | 0.5131 | -2.2974 | -3.3589 | 0.7315 | 1.0615 | -584.6987 | -498.2968 | 2.6931 | 2.1137 |
114
+ | 0.5796 | 0.67 | 5100 | 0.5084 | -2.1485 | -3.1709 | 0.7300 | 1.0224 | -565.8975 | -483.4113 | 2.4925 | 1.9365 |
115
+ | 0.5137 | 0.68 | 5200 | 0.5012 | -2.0083 | -2.9370 | 0.7365 | 0.9287 | -542.5073 | -469.3903 | 2.0969 | 1.5738 |
116
+ | 0.4484 | 0.69 | 5300 | 0.5022 | -2.1149 | -3.0765 | 0.7345 | 0.9616 | -556.4618 | -480.0531 | 2.2539 | 1.7154 |
117
+ | 0.4608 | 0.71 | 5400 | 0.5035 | -2.1639 | -3.1586 | 0.7380 | 0.9947 | -564.6663 | -484.9485 | 2.2224 | 1.6704 |
118
+ | 0.5746 | 0.72 | 5500 | 0.5045 | -2.3599 | -3.4023 | 0.7320 | 1.0424 | -589.0370 | -504.5520 | 2.2134 | 1.6562 |
119
+ | 0.5768 | 0.73 | 5600 | 0.5011 | -2.0662 | -3.0430 | 0.7375 | 0.9767 | -553.1031 | -475.1830 | 1.8199 | 1.2667 |
120
+ | 0.4359 | 0.75 | 5700 | 0.5032 | -2.0933 | -3.1100 | 0.7350 | 1.0166 | -559.8049 | -477.8932 | 1.9073 | 1.3503 |
121
+ | 0.4812 | 0.76 | 5800 | 0.5056 | -2.2931 | -3.3640 | 0.7320 | 1.0709 | -585.2068 | -497.8671 | 2.1234 | 1.5508 |
122
+ | 0.5048 | 0.77 | 5900 | 0.5036 | -1.9424 | -2.9286 | 0.7335 | 0.9862 | -541.6672 | -462.8024 | 1.7970 | 1.2367 |
123
+ | 0.4505 | 0.79 | 6000 | 0.5053 | -1.9881 | -2.9896 | 0.7330 | 1.0015 | -547.7703 | -467.3695 | 1.9582 | 1.3843 |
124
+ | 0.5197 | 0.8 | 6100 | 0.5071 | -2.0238 | -3.0391 | 0.7315 | 1.0152 | -552.7153 | -470.9445 | 2.0118 | 1.4341 |
125
+ | 0.6046 | 0.81 | 6200 | 0.5064 | -2.0803 | -3.1116 | 0.7310 | 1.0313 | -559.9708 | -476.5939 | 2.1151 | 1.5328 |
126
+ | 0.4669 | 0.82 | 6300 | 0.5072 | -2.1010 | -3.1541 | 0.7310 | 1.0531 | -564.2192 | -478.6570 | 2.2264 | 1.6394 |
127
+ | 0.5631 | 0.84 | 6400 | 0.5055 | -2.0938 | -3.1385 | 0.7305 | 1.0447 | -562.6528 | -477.9385 | 2.3072 | 1.7230 |
128
+ | 0.433 | 0.85 | 6500 | 0.5044 | -2.0630 | -3.0936 | 0.7290 | 1.0306 | -558.1638 | -474.8586 | 2.2760 | 1.6963 |
129
+ | 0.4908 | 0.86 | 6600 | 0.5043 | -2.0569 | -3.0863 | 0.7295 | 1.0294 | -557.4365 | -474.2540 | 2.3343 | 1.7557 |
130
+ | 0.522 | 0.88 | 6700 | 0.5039 | -2.0755 | -3.1060 | 0.7300 | 1.0304 | -559.4037 | -476.1125 | 2.3469 | 1.7706 |
131
+ | 0.4953 | 0.89 | 6800 | 0.5039 | -2.0918 | -3.1235 | 0.7290 | 1.0317 | -561.1605 | -477.7388 | 2.3881 | 1.8129 |
132
+ | 0.5683 | 0.9 | 6900 | 0.5036 | -2.0899 | -3.1203 | 0.7300 | 1.0304 | -560.8373 | -477.5472 | 2.3649 | 1.7897 |
133
+ | 0.5399 | 0.92 | 7000 | 0.5037 | -2.0831 | -3.1119 | 0.7295 | 1.0288 | -560.0004 | -476.8721 | 2.3590 | 1.7832 |
134
+ | 0.4628 | 0.93 | 7100 | 0.5035 | -2.0882 | -3.1188 | 0.7300 | 1.0307 | -560.6896 | -477.3761 | 2.3659 | 1.7910 |
135
+ | 0.5273 | 0.94 | 7200 | 0.5036 | -2.0897 | -3.1202 | 0.7295 | 1.0305 | -560.8275 | -477.5317 | 2.3594 | 1.7853 |
136
+ | 0.4445 | 0.96 | 7300 | 0.5035 | -2.0889 | -3.1197 | 0.7305 | 1.0308 | -560.7729 | -477.4447 | 2.3614 | 1.7871 |
137
+ | 0.4839 | 0.97 | 7400 | 0.5035 | -2.0894 | -3.1199 | 0.7310 | 1.0304 | -560.7961 | -477.5042 | 2.3646 | 1.7896 |
138
+ | 0.4425 | 0.98 | 7500 | 0.5036 | -2.0892 | -3.1197 | 0.7295 | 1.0304 | -560.7722 | -477.4810 | 2.3638 | 1.7891 |
139
+ | 0.5195 | 0.99 | 7600 | 0.5036 | -2.0892 | -3.1197 | 0.7295 | 1.0304 | -560.7722 | -477.4810 | 2.3638 | 1.7891 |
140
+
141
+
142
+ ### Framework versions
143
+
144
+ - PEFT 0.7.1
145
+ - Transformers 4.36.2
146
+ - Pytorch 2.1.2+cu121
147
+ - Datasets 2.14.6
148
+ - Tokenizers 0.15.0
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:14cb7e6958abee7d86f00491f4864cb5ab1f855466d48bbe2fb5240611391ecf
3
  size 83945744
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:32de060f550b49bef91ad6a0d101e86ddda426c28af4a0ad8c553cf1a3cdc6da
3
  size 83945744
all_results.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_logits/chosen": 1.789110541343689,
4
+ "eval_logits/rejected": 2.363781213760376,
5
+ "eval_logps/chosen": -477.48095703125,
6
+ "eval_logps/rejected": -560.772216796875,
7
+ "eval_loss": 0.5036382079124451,
8
+ "eval_rewards/accuracies": 0.7294999957084656,
9
+ "eval_rewards/chosen": -2.0892136096954346,
10
+ "eval_rewards/margins": 1.0304385423660278,
11
+ "eval_rewards/rejected": -3.119652509689331,
12
+ "eval_runtime": 1346.3395,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 1.486,
15
+ "eval_steps_per_second": 0.186,
16
+ "train_loss": 0.4069632120280327,
17
+ "train_runtime": 138556.5872,
18
+ "train_samples": 61135,
19
+ "train_samples_per_second": 0.441,
20
+ "train_steps_per_second": 0.055
21
+ }
eval_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_logits/chosen": 1.789110541343689,
4
+ "eval_logits/rejected": 2.363781213760376,
5
+ "eval_logps/chosen": -477.48095703125,
6
+ "eval_logps/rejected": -560.772216796875,
7
+ "eval_loss": 0.5036382079124451,
8
+ "eval_rewards/accuracies": 0.7294999957084656,
9
+ "eval_rewards/chosen": -2.0892136096954346,
10
+ "eval_rewards/margins": 1.0304385423660278,
11
+ "eval_rewards/rejected": -3.119652509689331,
12
+ "eval_runtime": 1346.3395,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 1.486,
15
+ "eval_steps_per_second": 0.186
16
+ }
runs/Jan19_20-14-26_wirandre-work-gpu-1/events.out.tfevents.1705695351.wirandre-work-gpu-1.3520.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f7b2d05045e9952d7e775130ec57b0cc2b0590e4bd851af2bf3272d342042a90
3
- size 429343
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d5a7e76ca762c872af47cd753dc47046eb559554acb1b78c417b5baec539a83e
3
+ size 432233
runs/Jan19_20-14-26_wirandre-work-gpu-1/events.out.tfevents.1705835254.wirandre-work-gpu-1.3520.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:067abbca36e58cbc2334b75c729519692a53a932a48de066edc5ee4e076319be
3
+ size 828
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.4069632120280327,
4
+ "train_runtime": 138556.5872,
5
+ "train_samples": 61135,
6
+ "train_samples_per_second": 0.441,
7
+ "train_steps_per_second": 0.055
8
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff