BraylonDash commited on
Commit
70a5420
1 Parent(s): b8a41a5

Model save

Browse files
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ base_model: microsoft/phi-2
9
+ model-index:
10
+ - name: phi-2-ipo-renew1
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # phi-2-ipo-renew1
18
+
19
+ This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 2028.0933
22
+ - Rewards/chosen: -0.1243
23
+ - Rewards/rejected: -0.2158
24
+ - Rewards/accuracies: 0.6900
25
+ - Rewards/margins: 0.0915
26
+ - Logps/rejected: -255.1287
27
+ - Logps/chosen: -269.0499
28
+ - Logits/rejected: 0.5909
29
+ - Logits/chosen: 0.5352
30
+
31
+ ## Model description
32
+
33
+ More information needed
34
+
35
+ ## Intended uses & limitations
36
+
37
+ More information needed
38
+
39
+ ## Training and evaluation data
40
+
41
+ More information needed
42
+
43
+ ## Training procedure
44
+
45
+ ### Training hyperparameters
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 5e-06
49
+ - train_batch_size: 4
50
+ - eval_batch_size: 4
51
+ - seed: 42
52
+ - distributed_type: multi-GPU
53
+ - gradient_accumulation_steps: 4
54
+ - total_train_batch_size: 16
55
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
+ - lr_scheduler_type: cosine
57
+ - lr_scheduler_warmup_ratio: 0.1
58
+ - num_epochs: 2
59
+
60
+ ### Training results
61
+
62
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
63
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
64
+ | 2496.843 | 0.05 | 100 | 2502.2668 | -0.0003 | -0.0002 | 0.5005 | -0.0002 | -233.5649 | -256.6506 | 0.8888 | 0.8318 |
65
+ | 2499.2807 | 0.1 | 200 | 2494.8354 | 0.0001 | -0.0005 | 0.5190 | 0.0006 | -233.5995 | -256.6106 | 0.8882 | 0.8310 |
66
+ | 2477.7609 | 0.16 | 300 | 2481.5015 | -0.0011 | -0.0031 | 0.5595 | 0.0019 | -233.8548 | -256.7285 | 0.8892 | 0.8319 |
67
+ | 2428.4195 | 0.21 | 400 | 2419.1045 | -0.0068 | -0.0156 | 0.6495 | 0.0089 | -235.1127 | -257.2951 | 0.8983 | 0.8404 |
68
+ | 2296.8842 | 0.26 | 500 | 2349.4358 | -0.0240 | -0.0419 | 0.6565 | 0.0179 | -237.7379 | -259.0124 | 0.8806 | 0.8214 |
69
+ | 2254.5846 | 0.31 | 600 | 2273.4993 | -0.0525 | -0.0829 | 0.6570 | 0.0304 | -241.8383 | -261.8659 | 0.8478 | 0.7868 |
70
+ | 2330.7787 | 0.37 | 700 | 2224.3350 | -0.0819 | -0.1221 | 0.6630 | 0.0402 | -245.7631 | -264.8093 | 0.8128 | 0.7517 |
71
+ | 2223.6863 | 0.42 | 800 | 2196.0991 | -0.1009 | -0.1487 | 0.6675 | 0.0478 | -248.4222 | -266.7057 | 0.7611 | 0.6992 |
72
+ | 2066.7418 | 0.47 | 900 | 2166.0732 | -0.1112 | -0.1658 | 0.6700 | 0.0546 | -250.1319 | -267.7397 | 0.7518 | 0.6917 |
73
+ | 2119.2691 | 0.52 | 1000 | 2138.9312 | -0.1215 | -0.1821 | 0.6715 | 0.0606 | -251.7610 | -268.7693 | 0.7213 | 0.6619 |
74
+ | 2191.7109 | 0.58 | 1100 | 2121.8115 | -0.1257 | -0.1906 | 0.6695 | 0.0648 | -252.6059 | -269.1910 | 0.7176 | 0.6584 |
75
+ | 2308.1883 | 0.63 | 1200 | 2110.3069 | -0.1409 | -0.2123 | 0.6665 | 0.0715 | -254.7812 | -270.7044 | 0.6920 | 0.6330 |
76
+ | 1996.7178 | 0.68 | 1300 | 2095.3130 | -0.1314 | -0.2042 | 0.6755 | 0.0728 | -253.9726 | -269.7621 | 0.6722 | 0.6141 |
77
+ | 2038.3844 | 0.73 | 1400 | 2085.0852 | -0.1383 | -0.2140 | 0.6800 | 0.0756 | -254.9441 | -270.4488 | 0.6513 | 0.5933 |
78
+ | 2094.2182 | 0.79 | 1500 | 2076.3042 | -0.1390 | -0.2166 | 0.6790 | 0.0777 | -255.2133 | -270.5129 | 0.6474 | 0.5898 |
79
+ | 2171.3457 | 0.84 | 1600 | 2069.3757 | -0.1374 | -0.2166 | 0.6810 | 0.0792 | -255.2130 | -270.3595 | 0.6392 | 0.5818 |
80
+ | 2189.3863 | 0.89 | 1700 | 2062.1995 | -0.1386 | -0.2192 | 0.6780 | 0.0806 | -255.4675 | -270.4739 | 0.6291 | 0.5723 |
81
+ | 2292.8938 | 0.94 | 1800 | 2053.1299 | -0.1196 | -0.2005 | 0.6830 | 0.0809 | -253.6025 | -268.5789 | 0.6275 | 0.5703 |
82
+ | 2085.5805 | 0.99 | 1900 | 2052.3237 | -0.1086 | -0.1906 | 0.6900 | 0.0821 | -252.6131 | -267.4730 | 0.6319 | 0.5747 |
83
+ | 1847.759 | 1.05 | 2000 | 2050.4177 | -0.1118 | -0.1953 | 0.6850 | 0.0836 | -253.0827 | -267.7950 | 0.6333 | 0.5763 |
84
+ | 2024.9559 | 1.1 | 2100 | 2046.7593 | -0.1219 | -0.2083 | 0.6900 | 0.0864 | -254.3799 | -268.8073 | 0.6157 | 0.5590 |
85
+ | 2038.6354 | 1.15 | 2200 | 2043.5728 | -0.1205 | -0.2072 | 0.6880 | 0.0867 | -254.2731 | -268.6722 | 0.6083 | 0.5518 |
86
+ | 2022.9617 | 1.2 | 2300 | 2035.5857 | -0.1173 | -0.2041 | 0.6895 | 0.0868 | -253.9597 | -268.3491 | 0.6101 | 0.5535 |
87
+ | 1871.641 | 1.26 | 2400 | 2036.3373 | -0.1190 | -0.2073 | 0.6895 | 0.0884 | -254.2831 | -268.5161 | 0.6046 | 0.5482 |
88
+ | 1907.3463 | 1.31 | 2500 | 2034.7010 | -0.1216 | -0.2108 | 0.6880 | 0.0892 | -254.6297 | -268.7765 | 0.6022 | 0.5460 |
89
+ | 1884.6086 | 1.36 | 2600 | 2033.7977 | -0.1215 | -0.2105 | 0.6910 | 0.0890 | -254.6014 | -268.7708 | 0.6013 | 0.5451 |
90
+ | 2034.9129 | 1.41 | 2700 | 2032.5447 | -0.1235 | -0.2140 | 0.6900 | 0.0905 | -254.9471 | -268.9633 | 0.5987 | 0.5426 |
91
+ | 2068.2822 | 1.47 | 2800 | 2030.8698 | -0.1251 | -0.2162 | 0.6900 | 0.0911 | -255.1671 | -269.1270 | 0.5943 | 0.5383 |
92
+ | 1977.4029 | 1.52 | 2900 | 2030.6033 | -0.1251 | -0.2162 | 0.6895 | 0.0911 | -255.1690 | -269.1252 | 0.5941 | 0.5381 |
93
+ | 2110.2887 | 1.57 | 3000 | 2030.5707 | -0.1259 | -0.2173 | 0.6905 | 0.0915 | -255.2821 | -269.2050 | 0.5908 | 0.5348 |
94
+ | 2068.2863 | 1.62 | 3100 | 2029.4174 | -0.1242 | -0.2156 | 0.6935 | 0.0914 | -255.1087 | -269.0390 | 0.5913 | 0.5357 |
95
+ | 1977.8852 | 1.67 | 3200 | 2026.1289 | -0.1249 | -0.2165 | 0.6960 | 0.0916 | -255.2016 | -269.1071 | 0.5920 | 0.5364 |
96
+ | 2123.3787 | 1.73 | 3300 | 2027.3552 | -0.1248 | -0.2162 | 0.6930 | 0.0914 | -255.1666 | -269.0933 | 0.5926 | 0.5370 |
97
+ | 1945.4934 | 1.78 | 3400 | 2025.7804 | -0.1248 | -0.2164 | 0.6935 | 0.0916 | -255.1899 | -269.1010 | 0.5909 | 0.5353 |
98
+ | 1937.2627 | 1.83 | 3500 | 2027.8240 | -0.1247 | -0.2163 | 0.6930 | 0.0916 | -255.1750 | -269.0878 | 0.5903 | 0.5347 |
99
+ | 2007.2062 | 1.88 | 3600 | 2025.3228 | -0.1244 | -0.2164 | 0.6895 | 0.0919 | -255.1843 | -269.0623 | 0.5910 | 0.5352 |
100
+ | 2076.715 | 1.94 | 3700 | 2027.4857 | -0.1243 | -0.2159 | 0.6920 | 0.0916 | -255.1383 | -269.0487 | 0.5913 | 0.5358 |
101
+ | 2055.2201 | 1.99 | 3800 | 2027.8082 | -0.1244 | -0.2160 | 0.6920 | 0.0916 | -255.1455 | -269.0543 | 0.5902 | 0.5347 |
102
+
103
+
104
+ ### Framework versions
105
+
106
+ - PEFT 0.7.1
107
+ - Transformers 4.36.2
108
+ - Pytorch 2.1.2
109
+ - Datasets 2.14.6
110
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b1e443f73b85d4ec394ab283e80202f225e7f72c26ac17b7def053990e86c27b
3
  size 41977616
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1fabac30d4b4dcaeac53729b01d4c7bb836714149a8a668514933f8c5f1b0a41
3
  size 41977616
all_results.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "eval_logits/chosen": 0.5351993441581726,
4
+ "eval_logits/rejected": 0.5909183621406555,
5
+ "eval_logps/chosen": -269.0498962402344,
6
+ "eval_logps/rejected": -255.128662109375,
7
+ "eval_loss": 2028.09326171875,
8
+ "eval_rewards/accuracies": 0.6899999976158142,
9
+ "eval_rewards/chosen": -0.12432491779327393,
10
+ "eval_rewards/margins": 0.0914725586771965,
11
+ "eval_rewards/rejected": -0.21579748392105103,
12
+ "eval_runtime": 416.5334,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 4.802,
15
+ "eval_steps_per_second": 1.2,
16
+ "train_loss": 2099.8451463309884,
17
+ "train_runtime": 42790.3459,
18
+ "train_samples": 30567,
19
+ "train_samples_per_second": 1.429,
20
+ "train_steps_per_second": 0.089
21
+ }
eval_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "eval_logits/chosen": 0.5351993441581726,
4
+ "eval_logits/rejected": 0.5909183621406555,
5
+ "eval_logps/chosen": -269.0498962402344,
6
+ "eval_logps/rejected": -255.128662109375,
7
+ "eval_loss": 2028.09326171875,
8
+ "eval_rewards/accuracies": 0.6899999976158142,
9
+ "eval_rewards/chosen": -0.12432491779327393,
10
+ "eval_rewards/margins": 0.0914725586771965,
11
+ "eval_rewards/rejected": -0.21579748392105103,
12
+ "eval_runtime": 416.5334,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 4.802,
15
+ "eval_steps_per_second": 1.2
16
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "train_loss": 2099.8451463309884,
4
+ "train_runtime": 42790.3459,
5
+ "train_samples": 30567,
6
+ "train_samples_per_second": 1.429,
7
+ "train_steps_per_second": 0.089
8
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff