tsavage68 commited on
Commit
de0f499
1 Parent(s): 9181712

End of training

Browse files
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  license: apache-2.0
3
- base_model: tsavage68/UTI_M2_1000steps_1e5rate_SFT
4
  tags:
5
  - trl
6
  - dpo
@@ -15,17 +15,17 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # UTI_M2_1000steps_1e7rate_01beta_CSFTDPO
17
 
18
- This model is a fine-tuned version of [tsavage68/UTI_M2_1000steps_1e5rate_SFT](https://huggingface.co/tsavage68/UTI_M2_1000steps_1e5rate_SFT) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.0727
21
- - Rewards/chosen: 0.2193
22
- - Rewards/rejected: -6.0491
23
- - Rewards/accuracies: 0.9000
24
- - Rewards/margins: 6.2684
25
- - Logps/rejected: -104.6572
26
- - Logps/chosen: -18.1014
27
- - Logits/rejected: -3.7969
28
- - Logits/chosen: -3.7251
29
 
30
  ## Model description
31
 
@@ -59,46 +59,46 @@ The following hyperparameters were used during training:
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:-------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
- | 0.6928 | 0.3333 | 25 | 0.6881 | 0.0023 | -0.0079 | 0.7300 | 0.0102 | -44.2452 | -20.2718 | -3.8169 | -3.7449 |
63
- | 0.6578 | 0.6667 | 50 | 0.6325 | 0.0226 | -0.1063 | 0.8800 | 0.1289 | -45.2294 | -20.0689 | -3.8170 | -3.7449 |
64
- | 0.5691 | 1.0 | 75 | 0.5043 | 0.0873 | -0.3772 | 0.9000 | 0.4645 | -47.9383 | -19.4217 | -3.8188 | -3.7462 |
65
- | 0.3598 | 1.3333 | 100 | 0.3707 | 0.1758 | -0.7982 | 0.8900 | 0.9740 | -52.1479 | -18.5365 | -3.8221 | -3.7487 |
66
- | 0.2599 | 1.6667 | 125 | 0.2538 | 0.3770 | -1.2635 | 0.9000 | 1.6406 | -56.8016 | -16.5241 | -3.8218 | -3.7458 |
67
- | 0.2098 | 2.0 | 150 | 0.1804 | 0.5103 | -1.7438 | 0.9000 | 2.2541 | -61.6038 | -15.1913 | -3.8223 | -3.7446 |
68
- | 0.1207 | 2.3333 | 175 | 0.1417 | 0.4831 | -2.3135 | 0.9000 | 2.7966 | -67.3010 | -15.4634 | -3.8216 | -3.7430 |
69
- | 0.1065 | 2.6667 | 200 | 0.1243 | 0.4944 | -2.7163 | 0.9000 | 3.2107 | -71.3287 | -15.3503 | -3.8232 | -3.7446 |
70
- | 0.0718 | 3.0 | 225 | 0.1093 | 0.4649 | -3.2331 | 0.9000 | 3.6980 | -76.4968 | -15.6456 | -3.8226 | -3.7447 |
71
- | 0.0801 | 3.3333 | 250 | 0.0995 | 0.4385 | -3.6294 | 0.9000 | 4.0680 | -80.4606 | -15.9091 | -3.8220 | -3.7449 |
72
- | 0.085 | 3.6667 | 275 | 0.0917 | 0.3946 | -3.9828 | 0.9000 | 4.3774 | -83.9940 | -16.3485 | -3.8197 | -3.7434 |
73
- | 0.0963 | 4.0 | 300 | 0.0881 | 0.4006 | -4.2890 | 0.9000 | 4.6896 | -87.0557 | -16.2884 | -3.8175 | -3.7419 |
74
- | 0.1106 | 4.3333 | 325 | 0.0833 | 0.3441 | -4.5561 | 0.9000 | 4.9002 | -89.7270 | -16.8535 | -3.8143 | -3.7396 |
75
- | 0.0394 | 4.6667 | 350 | 0.0807 | 0.3449 | -4.7929 | 0.9000 | 5.1378 | -92.0949 | -16.8457 | -3.8131 | -3.7390 |
76
- | 0.075 | 5.0 | 375 | 0.0779 | 0.2988 | -4.9903 | 0.9000 | 5.2891 | -94.0689 | -17.3067 | -3.8107 | -3.7369 |
77
- | 0.092 | 5.3333 | 400 | 0.0766 | 0.2916 | -5.1560 | 0.9000 | 5.4476 | -95.7264 | -17.3790 | -3.8095 | -3.7361 |
78
- | 0.0562 | 5.6667 | 425 | 0.0760 | 0.2880 | -5.2952 | 0.9000 | 5.5832 | -97.1186 | -17.4147 | -3.8072 | -3.7341 |
79
- | 0.0888 | 6.0 | 450 | 0.0749 | 0.2599 | -5.4359 | 0.9000 | 5.6958 | -98.5254 | -17.6955 | -3.8054 | -3.7326 |
80
- | 0.0194 | 6.3333 | 475 | 0.0751 | 0.2981 | -5.5269 | 0.9000 | 5.8251 | -99.4355 | -17.3131 | -3.8038 | -3.7313 |
81
- | 0.1235 | 6.6667 | 500 | 0.0738 | 0.2297 | -5.6612 | 0.9000 | 5.8909 | -100.7776 | -17.9974 | -3.8023 | -3.7299 |
82
- | 0.0372 | 7.0 | 525 | 0.0735 | 0.2308 | -5.7533 | 0.9000 | 5.9841 | -101.6993 | -17.9863 | -3.8012 | -3.7289 |
83
- | 0.0886 | 7.3333 | 550 | 0.0735 | 0.2500 | -5.8205 | 0.9000 | 6.0705 | -102.3713 | -17.7949 | -3.8005 | -3.7283 |
84
- | 0.0538 | 7.6667 | 575 | 0.0733 | 0.2463 | -5.8868 | 0.9000 | 6.1331 | -103.0340 | -17.8319 | -3.7993 | -3.7272 |
85
- | 0.1051 | 8.0 | 600 | 0.0732 | 0.2390 | -5.9183 | 0.9000 | 6.1573 | -103.3491 | -17.9050 | -3.7987 | -3.7268 |
86
- | 0.1221 | 8.3333 | 625 | 0.0731 | 0.2468 | -5.9459 | 0.9000 | 6.1927 | -103.6252 | -17.8267 | -3.7983 | -3.7263 |
87
- | 0.0189 | 8.6667 | 650 | 0.0729 | 0.2332 | -5.9813 | 0.9000 | 6.2145 | -103.9791 | -17.9627 | -3.7980 | -3.7261 |
88
- | 0.054 | 9.0 | 675 | 0.0728 | 0.2283 | -6.0081 | 0.9000 | 6.2365 | -104.2476 | -18.0115 | -3.7976 | -3.7257 |
89
- | 0.0531 | 9.3333 | 700 | 0.0728 | 0.2256 | -6.0215 | 0.9000 | 6.2471 | -104.3812 | -18.0390 | -3.7974 | -3.7255 |
90
- | 0.0879 | 9.6667 | 725 | 0.0727 | 0.2184 | -6.0356 | 0.9000 | 6.2540 | -104.5217 | -18.1106 | -3.7972 | -3.7253 |
91
- | 0.0886 | 10.0 | 750 | 0.0727 | 0.2191 | -6.0427 | 0.9000 | 6.2618 | -104.5926 | -18.1033 | -3.7972 | -3.7254 |
92
- | 0.0872 | 10.3333 | 775 | 0.0727 | 0.2206 | -6.0452 | 0.9000 | 6.2658 | -104.6181 | -18.0881 | -3.7971 | -3.7252 |
93
- | 0.0882 | 10.6667 | 800 | 0.0727 | 0.2196 | -6.0466 | 0.9000 | 6.2662 | -104.6325 | -18.0986 | -3.7971 | -3.7253 |
94
- | 0.0704 | 11.0 | 825 | 0.0727 | 0.2205 | -6.0474 | 0.9000 | 6.2679 | -104.6404 | -18.0897 | -3.7969 | -3.7251 |
95
- | 0.0535 | 11.3333 | 850 | 0.0727 | 0.2215 | -6.0501 | 0.9000 | 6.2717 | -104.6675 | -18.0791 | -3.7970 | -3.7251 |
96
- | 0.0707 | 11.6667 | 875 | 0.0727 | 0.2212 | -6.0475 | 0.9000 | 6.2687 | -104.6410 | -18.0829 | -3.7970 | -3.7251 |
97
- | 0.0707 | 12.0 | 900 | 0.0727 | 0.2204 | -6.0473 | 0.9000 | 6.2677 | -104.6390 | -18.0907 | -3.7969 | -3.7251 |
98
- | 0.036 | 12.3333 | 925 | 0.0727 | 0.2211 | -6.0497 | 0.9000 | 6.2708 | -104.6633 | -18.0839 | -3.7969 | -3.7251 |
99
- | 0.0707 | 12.6667 | 950 | 0.0727 | 0.2193 | -6.0491 | 0.9000 | 6.2684 | -104.6572 | -18.1014 | -3.7969 | -3.7251 |
100
- | 0.0546 | 13.0 | 975 | 0.0727 | 0.2193 | -6.0491 | 0.9000 | 6.2684 | -104.6572 | -18.1014 | -3.7969 | -3.7251 |
101
- | 0.0885 | 13.3333 | 1000 | 0.0727 | 0.2193 | -6.0491 | 0.9000 | 6.2684 | -104.6572 | -18.1014 | -3.7969 | -3.7251 |
102
 
103
 
104
  ### Framework versions
 
1
  ---
2
  license: apache-2.0
3
+ base_model: tsavage68/UTI_M2_1000steps_1e7rate_SFT
4
  tags:
5
  - trl
6
  - dpo
 
15
 
16
  # UTI_M2_1000steps_1e7rate_01beta_CSFTDPO
17
 
18
+ This model is a fine-tuned version of [tsavage68/UTI_M2_1000steps_1e7rate_SFT](https://huggingface.co/tsavage68/UTI_M2_1000steps_1e7rate_SFT) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.6931
21
+ - Rewards/chosen: 0.0
22
+ - Rewards/rejected: 0.0
23
+ - Rewards/accuracies: 0.0
24
+ - Rewards/margins: 0.0
25
+ - Logps/rejected: 0.0
26
+ - Logps/chosen: 0.0
27
+ - Logits/rejected: -2.7147
28
+ - Logits/chosen: -2.7147
29
 
30
  ## Model description
31
 
 
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:-------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
+ | 0.6931 | 0.3333 | 25 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
63
+ | 0.6931 | 0.6667 | 50 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
64
+ | 0.6931 | 1.0 | 75 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
65
+ | 0.6931 | 1.3333 | 100 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
66
+ | 0.6931 | 1.6667 | 125 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
67
+ | 0.6931 | 2.0 | 150 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
68
+ | 0.6931 | 2.3333 | 175 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
69
+ | 0.6931 | 2.6667 | 200 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
70
+ | 0.6931 | 3.0 | 225 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
71
+ | 0.6931 | 3.3333 | 250 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
72
+ | 0.6931 | 3.6667 | 275 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
73
+ | 0.6931 | 4.0 | 300 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
74
+ | 0.6931 | 4.3333 | 325 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
75
+ | 0.6931 | 4.6667 | 350 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
76
+ | 0.6931 | 5.0 | 375 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
77
+ | 0.6931 | 5.3333 | 400 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
78
+ | 0.6931 | 5.6667 | 425 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
79
+ | 0.6931 | 6.0 | 450 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
80
+ | 0.6931 | 6.3333 | 475 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
81
+ | 0.6931 | 6.6667 | 500 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
82
+ | 0.6931 | 7.0 | 525 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
83
+ | 0.6931 | 7.3333 | 550 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
84
+ | 0.6931 | 7.6667 | 575 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
85
+ | 0.6931 | 8.0 | 600 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
86
+ | 0.6931 | 8.3333 | 625 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
87
+ | 0.6931 | 8.6667 | 650 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
88
+ | 0.6931 | 9.0 | 675 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
89
+ | 0.6931 | 9.3333 | 700 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
90
+ | 0.6931 | 9.6667 | 725 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
91
+ | 0.6931 | 10.0 | 750 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
92
+ | 0.6931 | 10.3333 | 775 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
93
+ | 0.6931 | 10.6667 | 800 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
94
+ | 0.6931 | 11.0 | 825 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
95
+ | 0.6931 | 11.3333 | 850 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
96
+ | 0.6931 | 11.6667 | 875 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
97
+ | 0.6931 | 12.0 | 900 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
98
+ | 0.6931 | 12.3333 | 925 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
99
+ | 0.6931 | 12.6667 | 950 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
100
+ | 0.6931 | 13.0 | 975 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
101
+ | 0.6931 | 13.3333 | 1000 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
102
 
103
 
104
  ### Framework versions
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "tsavage68/UTI_M2_1000steps_1e5rate_SFT",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
 
1
  {
2
+ "_name_or_path": "tsavage68/UTI_M2_1000steps_1e7rate_SFT",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
final_checkpoint/config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "tsavage68/UTI_M2_1000steps_1e5rate_SFT",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
 
1
  {
2
+ "_name_or_path": "tsavage68/UTI_M2_1000steps_1e7rate_SFT",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
final_checkpoint/model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0f764b561fa1c8a6efb7650b435dd848319efb2df8cbb34c7f9e7f0f7e459cce
3
  size 4943162240
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9aa2e9687a5e5d24a999a996e9fe4c2bc1cf34ad347da5dc5c7e0adffcb14982
3
  size 4943162240
final_checkpoint/model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0501982c13663aa4ed54075e720fa5f6858eb64c89d0642393b2958da76bc84d
3
  size 4999819232
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:268bb18cc8bbff53c912fa3961a6281dd5c163edd1b8e5c85c9b12e87e4e3a63
3
  size 4999819232
final_checkpoint/model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0bb3a58de76e91a9748c533089c0d4fe08ef3611471f627d36f8f316c53dfc36
3
  size 4540516256
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbc021dcf68d9e7ddaab0ead255721e73b7f652e3bfd34985bba6c029e0b729c
3
  size 4540516256
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0f764b561fa1c8a6efb7650b435dd848319efb2df8cbb34c7f9e7f0f7e459cce
3
  size 4943162240
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9aa2e9687a5e5d24a999a996e9fe4c2bc1cf34ad347da5dc5c7e0adffcb14982
3
  size 4943162240
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0501982c13663aa4ed54075e720fa5f6858eb64c89d0642393b2958da76bc84d
3
  size 4999819232
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:268bb18cc8bbff53c912fa3961a6281dd5c163edd1b8e5c85c9b12e87e4e3a63
3
  size 4999819232
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0bb3a58de76e91a9748c533089c0d4fe08ef3611471f627d36f8f316c53dfc36
3
  size 4540516256
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbc021dcf68d9e7ddaab0ead255721e73b7f652e3bfd34985bba6c029e0b729c
3
  size 4540516256
tokenizer_config.json CHANGED
@@ -33,7 +33,7 @@
33
  "clean_up_tokenization_spaces": false,
34
  "eos_token": "</s>",
35
  "legacy": true,
36
- "max_length": 100,
37
  "model_max_length": 1000000000000000019884624838656,
38
  "pad_token": "</s>",
39
  "sp_model_kwargs": {},
 
33
  "clean_up_tokenization_spaces": false,
34
  "eos_token": "</s>",
35
  "legacy": true,
36
+ "max_length": 1024,
37
  "model_max_length": 1000000000000000019884624838656,
38
  "pad_token": "</s>",
39
  "sp_model_kwargs": {},
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:27ed7fbf76d2fad7b5c470b1b5a84e0dd32dae572fb45531962b3d00cc3d6c20
3
  size 4667
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:73471d501db896b3d8429b61e9e393edbc834f8877d47e77260677c025261f7b
3
  size 4667