Transformers
PyTorch
Inference Endpoints
xiuyul commited on
Commit
92c52cf
1 Parent(s): a878b63

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,3 +1,100 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ base_model: xiuyul/mamba-2.8b-ultrachat
4
+ datasets:
5
+ - HuggingFaceH4/ultrafeedback_binarized
6
+ model-index:
7
+ - name: mamba-2.8b-zephyr
8
+ results: []
9
  ---
10
+
11
+
12
+ # mamba-2.8b-zephyr
13
+
14
+ This model is a fine-tuned version of [xiuyul/mamba-2.8b-ultrachat](https://huggingface.co/xiuyul/mamba-2.8b-ultrachat) on the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset trained using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290).
15
+
16
+ The base model, [xiuyul/mamba-2.8b-ultrachat](https://huggingface.co/xiuyul/mamba-2.8b-ultrachat), was instruction-tuned from [state-spaces/mamba-2.8b-slimpj](https://huggingface.co/state-spaces/mamba-2.8b-slimpj) on the [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) dataset.
17
+
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 0.4996
20
+ - Rewards/chosen: -0.4523
21
+ - Rewards/rejected: -1.6105
22
+ - Rewards/accuracies: 0.7857
23
+ - Rewards/margins: 1.1582
24
+ - Logps/rejected: -290.1885
25
+ - Logps/chosen: -359.0926
26
+ - Logits/rejected: 23.0423
27
+ - Logits/chosen: 23.1861
28
+
29
+ ## Model description
30
+
31
+ More information needed
32
+
33
+ ## Intended uses & limitations
34
+
35
+ More information needed
36
+
37
+ ## Training and evaluation data
38
+
39
+ More information needed
40
+
41
+ ## Training procedure
42
+
43
+ ### Training hyperparameters
44
+
45
+ The following hyperparameters were used during training:
46
+ - learning_rate: 5e-07
47
+ - train_batch_size: 4
48
+ - eval_batch_size: 4
49
+ - seed: 42
50
+ - distributed_type: multi-GPU
51
+ - num_devices: 8
52
+ - gradient_accumulation_steps: 2
53
+ - total_train_batch_size: 64
54
+ - total_eval_batch_size: 32
55
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
+ - lr_scheduler_type: linear
57
+ - lr_scheduler_warmup_ratio: 0.1
58
+ - num_epochs: 3
59
+
60
+ ### Training results
61
+
62
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
63
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
64
+ | 0.6639 | 0.1 | 100 | 0.6593 | 0.1762 | 0.0957 | 0.6151 | 0.0805 | -273.1268 | -352.8086 | 23.5852 | 23.8356 |
65
+ | 0.5804 | 0.21 | 200 | 0.5836 | 0.0780 | -0.3396 | 0.6508 | 0.4176 | -277.4798 | -353.7904 | 23.5872 | 23.8302 |
66
+ | 0.5815 | 0.31 | 300 | 0.5510 | -0.1923 | -0.7857 | 0.7421 | 0.5934 | -281.9403 | -356.4929 | 23.5224 | 23.7498 |
67
+ | 0.5526 | 0.41 | 400 | 0.5361 | -0.1953 | -0.8928 | 0.7341 | 0.6975 | -283.0119 | -356.5235 | 23.5033 | 23.7264 |
68
+ | 0.5225 | 0.52 | 500 | 0.5262 | -0.1041 | -0.8809 | 0.7540 | 0.7768 | -282.8929 | -355.6114 | 23.4578 | 23.6718 |
69
+ | 0.5577 | 0.62 | 600 | 0.5156 | -0.1946 | -1.0285 | 0.7659 | 0.8339 | -284.3683 | -356.5158 | 23.4466 | 23.6618 |
70
+ | 0.5515 | 0.72 | 700 | 0.5163 | 0.0648 | -0.7650 | 0.7659 | 0.8298 | -281.7334 | -353.9220 | 23.4243 | 23.6343 |
71
+ | 0.5159 | 0.83 | 800 | 0.5113 | -0.1400 | -1.0595 | 0.7778 | 0.9195 | -284.6783 | -355.9698 | 23.4095 | 23.6179 |
72
+ | 0.5242 | 0.93 | 900 | 0.5089 | -0.0383 | -0.9148 | 0.7659 | 0.8766 | -283.2318 | -354.9529 | 23.4035 | 23.6145 |
73
+ | 0.4618 | 1.03 | 1000 | 0.5077 | -0.1223 | -1.0201 | 0.7778 | 0.8978 | -284.2841 | -355.7929 | 23.3805 | 23.5856 |
74
+ | 0.4484 | 1.14 | 1100 | 0.5019 | -0.3311 | -1.3299 | 0.7778 | 0.9989 | -287.3827 | -357.8807 | 23.3427 | 23.5381 |
75
+ | 0.4228 | 1.24 | 1200 | 0.5034 | -0.0617 | -1.0989 | 0.7619 | 1.0372 | -285.0726 | -355.1871 | 23.3191 | 23.5101 |
76
+ | 0.4306 | 1.34 | 1300 | 0.5032 | -0.1585 | -1.1849 | 0.7698 | 1.0264 | -285.9320 | -356.1549 | 23.2889 | 23.4787 |
77
+ | 0.4678 | 1.45 | 1400 | 0.5030 | -0.2351 | -1.1601 | 0.7817 | 0.9250 | -285.6841 | -356.9207 | 23.2661 | 23.4551 |
78
+ | 0.4317 | 1.55 | 1500 | 0.4997 | -0.1401 | -1.1458 | 0.7619 | 1.0057 | -285.5417 | -355.9716 | 23.2621 | 23.4524 |
79
+ | 0.4363 | 1.65 | 1600 | 0.5010 | -0.3313 | -1.3592 | 0.7738 | 1.0279 | -287.6752 | -357.8830 | 23.2320 | 23.4178 |
80
+ | 0.408 | 1.76 | 1700 | 0.4989 | -0.2456 | -1.3073 | 0.7778 | 1.0617 | -287.1568 | -357.0265 | 23.2135 | 23.3950 |
81
+ | 0.4076 | 1.86 | 1800 | 0.4996 | -0.3904 | -1.4365 | 0.7659 | 1.0461 | -288.4482 | -358.4738 | 23.1866 | 23.3617 |
82
+ | 0.4547 | 1.96 | 1900 | 0.5008 | -0.2516 | -1.2648 | 0.7857 | 1.0133 | -286.7317 | -357.0858 | 23.1605 | 23.3298 |
83
+ | 0.3469 | 2.07 | 2000 | 0.4977 | -0.2868 | -1.3916 | 0.7778 | 1.1048 | -287.9999 | -357.4383 | 23.1361 | 23.2990 |
84
+ | 0.3547 | 2.17 | 2100 | 0.4987 | -0.4251 | -1.5510 | 0.7619 | 1.1259 | -289.5935 | -358.8210 | 23.1142 | 23.2730 |
85
+ | 0.3468 | 2.27 | 2200 | 0.4979 | -0.2674 | -1.3945 | 0.7778 | 1.1271 | -288.0285 | -357.2443 | 23.0998 | 23.2561 |
86
+ | 0.3432 | 2.37 | 2300 | 0.5026 | -0.3792 | -1.4630 | 0.7738 | 1.0838 | -288.7130 | -358.3621 | 23.0726 | 23.2233 |
87
+ | 0.324 | 2.48 | 2400 | 0.5022 | -0.4892 | -1.6090 | 0.7698 | 1.1198 | -290.1737 | -359.4620 | 23.0543 | 23.2006 |
88
+ | 0.3556 | 2.58 | 2500 | 0.5010 | -0.5270 | -1.6576 | 0.7817 | 1.1306 | -290.6595 | -359.8404 | 23.0520 | 23.1981 |
89
+ | 0.3277 | 2.68 | 2600 | 0.4990 | -0.5401 | -1.6816 | 0.7778 | 1.1415 | -290.8996 | -359.9708 | 23.0449 | 23.1901 |
90
+ | 0.3262 | 2.79 | 2700 | 0.4993 | -0.4952 | -1.6410 | 0.7778 | 1.1458 | -290.4932 | -359.5220 | 23.0439 | 23.1878 |
91
+ | 0.3566 | 2.89 | 2800 | 0.4985 | -0.4474 | -1.5918 | 0.7778 | 1.1443 | -290.0010 | -359.0445 | 23.0433 | 23.1871 |
92
+ | 0.3386 | 2.99 | 2900 | 0.4983 | -0.4598 | -1.6040 | 0.7817 | 1.1442 | -290.1235 | -359.1679 | 23.0427 | 23.1866 |
93
+
94
+
95
+ ### Framework versions
96
+
97
+ - Transformers 4.35.0
98
+ - Pytorch 2.1.1+cu121
99
+ - Datasets 2.14.6
100
+ - Tokenizers 0.14.1
config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "d_model": 2560,
3
+ "n_layer": 64,
4
+ "vocab_size": 50277,
5
+ "ssm_cfg": {},
6
+ "rms_norm": true,
7
+ "residual_in_fp32": true,
8
+ "fused_add_norm": true,
9
+ "pad_vocab_size_multiple": 8
10
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ee62297c0d72b95746d79f5c015b674a1e7ea796d2d83e1656211cb23562d98
3
+ size 5536898154
special_tokens_map.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": "<|endoftext|>",
10
+ "pad_token": "<|endoftext|>",
11
+ "unk_token": {
12
+ "content": "<|endoftext|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ }
18
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<|padding|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "50254": {
21
+ "content": " ",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": false
27
+ },
28
+ "50255": {
29
+ "content": " ",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": false
35
+ },
36
+ "50256": {
37
+ "content": " ",
38
+ "lstrip": false,
39
+ "normalized": true,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": false
43
+ },
44
+ "50257": {
45
+ "content": " ",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": false
51
+ },
52
+ "50258": {
53
+ "content": " ",
54
+ "lstrip": false,
55
+ "normalized": true,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": false
59
+ },
60
+ "50259": {
61
+ "content": " ",
62
+ "lstrip": false,
63
+ "normalized": true,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": false
67
+ },
68
+ "50260": {
69
+ "content": " ",
70
+ "lstrip": false,
71
+ "normalized": true,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": false
75
+ },
76
+ "50261": {
77
+ "content": " ",
78
+ "lstrip": false,
79
+ "normalized": true,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": false
83
+ },
84
+ "50262": {
85
+ "content": " ",
86
+ "lstrip": false,
87
+ "normalized": true,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": false
91
+ },
92
+ "50263": {
93
+ "content": " ",
94
+ "lstrip": false,
95
+ "normalized": true,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": false
99
+ },
100
+ "50264": {
101
+ "content": " ",
102
+ "lstrip": false,
103
+ "normalized": true,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": false
107
+ },
108
+ "50265": {
109
+ "content": " ",
110
+ "lstrip": false,
111
+ "normalized": true,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": false
115
+ },
116
+ "50266": {
117
+ "content": " ",
118
+ "lstrip": false,
119
+ "normalized": true,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": false
123
+ },
124
+ "50267": {
125
+ "content": " ",
126
+ "lstrip": false,
127
+ "normalized": true,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": false
131
+ },
132
+ "50268": {
133
+ "content": " ",
134
+ "lstrip": false,
135
+ "normalized": true,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": false
139
+ },
140
+ "50269": {
141
+ "content": " ",
142
+ "lstrip": false,
143
+ "normalized": true,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": false
147
+ },
148
+ "50270": {
149
+ "content": " ",
150
+ "lstrip": false,
151
+ "normalized": true,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": false
155
+ },
156
+ "50271": {
157
+ "content": " ",
158
+ "lstrip": false,
159
+ "normalized": true,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": false
163
+ },
164
+ "50272": {
165
+ "content": " ",
166
+ "lstrip": false,
167
+ "normalized": true,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": false
171
+ },
172
+ "50273": {
173
+ "content": " ",
174
+ "lstrip": false,
175
+ "normalized": true,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": false
179
+ },
180
+ "50274": {
181
+ "content": " ",
182
+ "lstrip": false,
183
+ "normalized": true,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": false
187
+ },
188
+ "50275": {
189
+ "content": " ",
190
+ "lstrip": false,
191
+ "normalized": true,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": false
195
+ },
196
+ "50276": {
197
+ "content": " ",
198
+ "lstrip": false,
199
+ "normalized": true,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": false
203
+ }
204
+ },
205
+ "bos_token": "<|endoftext|>",
206
+ "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
207
+ "clean_up_tokenization_spaces": true,
208
+ "eos_token": "<|endoftext|>",
209
+ "model_max_length": 1000000000000000019884624838656,
210
+ "pad_token": "<|endoftext|>",
211
+ "tokenizer_class": "GPTNeoXTokenizer",
212
+ "unk_token": "<|endoftext|>"
213
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "train_loss": 0.446941960284861,
4
+ "train_runtime": 57869.3533,
5
+ "train_samples": 61966,
6
+ "train_samples_per_second": 3.212,
7
+ "train_steps_per_second": 0.05
8
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d383359411b042512bd7e853051fc646a03347f1e19ffc7b0697087b978c355c
3
+ size 5688