LatentWanderer commited on
Commit
c467f64
·
verified ·
1 Parent(s): 42edad4

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,265 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <p align="center" width="100%">
2
+ </p>
3
+
4
+ <div id="top" align="center">
5
+
6
+ FuseO1-Preview: System-II Reasoning Fusion of LLMs
7
+ -----------------------------
8
+
9
+ <h4> |<a href="https://arxiv.org/abs/2408.07990"> 📑 Paper </a> |
10
+ <a href="https://github.com/fanqiwan/FuseAI"> 🐱 GitHub Repo </a> |
11
+ <a href="https://huggingface.co/FuseAI"> 🤗 Hugging Face </a> |
12
+ <a href="https://huggingface.co/blog/Wanfq/fuseo1-preview"> 🌐 Blog </a> |
13
+ </h4>
14
+
15
+ <!-- **Authors:** -->
16
+
17
+ _Fanqi Wan, Longguang Zhong, Ziyi Yang, Weizhou Shen, Xinting Huang_
18
+
19
+
20
+ <!-- **Affiliations:** -->
21
+
22
+ _FuseAI Team_
23
+
24
+ </div>
25
+
26
+ <p align="center">
27
+ <img src="./assets/fuseo1-preview.jpg" width="100%"> <br>
28
+ </p>
29
+
30
+
31
+ ## Overview
32
+
33
+ [FuseO1-Preview](https://huggingface.co/collections/FuseAI/fuseo1-preview-678eb56093649b2688bc9977) is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing our advanced [SCE](https://arxiv.org/abs/2408.07990) merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.
34
+
35
+ <p align="center">
36
+ <img src="./assets/sce.jpg" width="70%"> <br>
37
+ </p>
38
+
39
+ To achieve this, we conduct two types of model merging:
40
+
41
+ - **Long-Long Reasoning Merging**: This approach involves model fusion across LLMs that utilize long-CoT reasoning, with the goal of enhancing long-CoT reasoning capabilities. The resulted [FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview) achieves a Pass@1 accuracy of **74.0 on AIME24**, demonstrating significant performance improvements compared to the OpenAI o1-preview (44.6) and OpenAI o1-mini (63.4), even approaching OpenAI o1 (79.2).
42
+ - **Long-Short Reasoning Merging**: This approach involves model fusion between long-CoT and short-CoT LLMs, aiming to improve reasoning capabilities in both long and short reasoning processes. The resulted [FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview) is capable of utilizing both long and short reasoning processes and demonstrates relatively strong performance in long reasoning tasks.
43
+
44
+ | Model | Merge Type | Source Models | HF Link |
45
+ |:----- | ---- | ---- | ---- |
46
+ | [FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview) | Long-Long Reasoning Merge | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B), [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview), [NovaSky-AI/Sky-T1-32B-Preview](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview) | [🤗 Hugging Face](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview), [GGUF](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview-GGUF) |
47
+ | [FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview) | Long-Long Reasoning Merge | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B), [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) | [🤗 Hugging Face](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview) |
48
+ | [FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview) | Long-Short Reasoning Merge | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B), [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | [🤗 Hugging Face](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview) |
49
+ | [FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview) | Long-Short Reasoning Merge | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B), [Qwen/Qwen2.5-32B-Coder](https://huggingface.co/Qwen/Qwen2.5-32B-Coder) | [🤗 Hugging Face](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview) |
50
+
51
+
52
+ ## Long-Long Reasoning Merging
53
+
54
+ We conduct experiments on these folloing long-cot LLMs.
55
+
56
+ - [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)
57
+ - [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview)
58
+ - [NovaSky-AI/Sky-T1-32B-Preview](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview)
59
+
60
+ To reproduce the merged [FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview) model, using the script below.
61
+
62
+ ```sh
63
+ cd FuseAI/FuseO1-Preview/mergekit
64
+ pip3 install -e .
65
+ model_save_dir=xx # your path to save the merged models
66
+ mergekit-yaml fuseo1_configs/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview.yaml ${model_save_dir}/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview --cudas
67
+ ```
68
+
69
+ To reproduce the merged [FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview) model, using the script below.
70
+
71
+ ```sh
72
+ cd FuseAI/FuseO1-Preview/mergekit
73
+ pip3 install -e .
74
+ model_save_dir=xxx # your path to save the merged models
75
+ mergekit-yaml fuseo1_configs/FuseO1-DeepSeekR1-QwQ-32B-Preview.yaml ${model_save_dir}/FuseO1-DeepSeekR1-QwQ-32B-Preview --cuda
76
+ ```
77
+
78
+ We provide the example code to use FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview.
79
+
80
+ ```python3
81
+ from vllm import LLM, SamplingParams
82
+
83
+ llm = LLM(model="FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview", tensor_parallel_size=8)
84
+ sampling_params = SamplingParams(max_tokens=32768, temperature=0.7, stop=["<|im_end|>", "<|end▁of▁sentence|>"], stop_token_ids=[151645, 151643])
85
+
86
+ conversations = [
87
+ [
88
+ {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{{}}."},
89
+ {"role": "user", "content": "Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$."},
90
+ ],
91
+ ]
92
+
93
+ responses = llm.chat(messages=conversations, sampling_params=sampling_params, use_tqdm=True)
94
+
95
+ for response in responses:
96
+ print(response.outputs[0].text.strip())
97
+ ```
98
+
99
+ ## Long-Short Reasoning Merging
100
+
101
+ We conduct experiments on these folloing long-cot and short-cot LLMs.
102
+
103
+ - [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)
104
+ - [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)
105
+ - [Qwen/Qwen2.5-32B-Coder](https://huggingface.co/Qwen/Qwen2.5-32B-Coder)
106
+
107
+ To reproduce the merged [FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview) model, using the script below.
108
+
109
+ ```sh
110
+ cd FuseAI/FuseO1-Preview/mergekit
111
+ pip3 install -e .
112
+ model_save_dir=xxx # your path to save the merged models
113
+ mergekit-yaml fuseo1_configs/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview.yaml ${model_save_dir}/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview --cuda
114
+ ```
115
+
116
+ To reproduce the merged [FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview) model, using the script below.
117
+
118
+ ```sh
119
+ cd FuseAI/FuseO1-Preview/mergekit
120
+ pip3 install -e .
121
+ model_save_dir=xxx # your path to save the merged models
122
+ mergekit-yaml fuseo1_configs/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview.yaml ${model_save_dir}/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview --cuda
123
+ ```
124
+
125
+ We provide the code to use FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview.
126
+
127
+ ```python3
128
+ from vllm import LLM, SamplingParams
129
+
130
+ llm = LLM(model="FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview", tensor_parallel_size=8)
131
+ sampling_params = SamplingParams(max_tokens=32768, temperature=0.7, stop=["<|im_end|>", "<|end▁of▁sentence|>"], stop_token_ids=[151645, 151643])
132
+
133
+ conversations = [
134
+ [
135
+ {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{{}}."},
136
+ {"role": "user", "content": "Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$."},
137
+ ],
138
+ ]
139
+
140
+ responses = llm.chat(messages=conversations, sampling_params=sampling_params, use_tqdm=True)
141
+
142
+ for response in responses:
143
+ print(response.outputs[0].text.strip())
144
+ ```
145
+
146
+ ## Evaluation Results
147
+
148
+ We test the resulted models on three kinds of benchmarks, including **Math Reasoning**, **Code Reasoning** , and **Scientific Reasoning**.
149
+
150
+ Math Reasoning
151
+ - AIME24
152
+ - MATH500
153
+ - OlympiadBench
154
+
155
+ Scientific Reasoning
156
+ - GPQA-Diamond
157
+ - MMLU-Pro
158
+ - MMLU
159
+
160
+
161
+ Code Reasoning
162
+ - LiveCodeBench (2408-2502)
163
+
164
+ > Important Note: We manully set `"add_bos_token": false` in `tokenizer_config.json` for all the evaluated LLMs to prevent the bos_token to be added twice for each prompt. Please download and modify to ensure consistency.
165
+
166
+ ### Math Reasoning
167
+
168
+ The evaluation code is modified from [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math). In our evaluation, we set the temperature to 0.6, the top-p to 0.95 and the max_tokens to 32768. We provide the example to reproduce our results in [math_evaluation](https://github.com/fanqiwan/FuseAI/tree/main/FuseO1-Preview/math_evaluation).
169
+
170
+ The system prompt for evaluation is set to:
171
+
172
+ ```sh
173
+ Please reason step by step, and put your final answer within \\boxed{{}}.
174
+ ```
175
+
176
+ The evaluation results are shown in the table below:
177
+
178
+ In our evaluation of AIME24, we follow the method from DeepSeek-R1, wherein Pass@1 is computed by averaging the results across 32 sampled responses per prompt, while Cons@32 is determined through self-consistency analysis of the same 32 sampled responses for each prompt. For other benchmarks, we only sample 1 response and report the Pass@1.
179
+
180
+ | Models | AIME24 Pass@1 | AIME24 Cons@32 | MATH500 | OlympiadBench |
181
+ |:------ | --------------| ------------------- | ------------ | -------------- |
182
+ | OpenAI o1 | 79.2 | - | 96.4 | - |
183
+ | OpenAI o1-preview | 44.6 | - | 85.5 | - |
184
+ | OpenAI o1-mini | 63.6 | - | 90.0 | - |
185
+ | DeepSeek R1 | 79.8 | - | 97.3 | - |
186
+ | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 69.2 | 83.3 | 93.6 | 64.3 |
187
+ | [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) | 43.8 | 56.7 | 88.4 | 60.3 |
188
+ | [NovaSky-AI/Sky-T1-32B-Preview](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview) | 37.7 | 50.0 | 88.0 | 55.1 |
189
+ | [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 17.0 | 20.0 | 81.8 | 48.1 |
190
+ | [FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview) | 68.6 | 83.3 | 94.6 | 64.9 |
191
+ | [FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview) | 69.7 | 83.3 | 94.6 | 64.0 |
192
+ | [FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview) | 74.0 | 86.7 | 94.8 | 65.0 |
193
+
194
+ We show that our merged FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview demonstrate superior performance improvements comparet to DeepSeek-R1-Distill-Qwen-32B, QwQ-32B-Preview, and Sky-T1-32B-Preview on math reasoning. Specifically, our model achieves an accuracy of **74.0 Pass@1 and 86.7 Cons@32 on AIME24**, demonstrating significant performance improvements compared to DeepSeek-R1-Distill-Qwen-32B (69.2 Pass@1 and 83.3 Cons@32), OpenAI o1-preview (44.6 Pass@1) and OpenAI o1-mini (63.4 Pass@1), even approaching OpenAI o1 (79.2 Pass@1).
195
+
196
+ ### Scientific Reasoning
197
+
198
+ The evaluation code is modified from [SkyThought](https://github.com/NovaSky-AI/SkyThought). In our evaluation, we set the temperature to 0.7 and the max_tokens to 32768. We provide the example to reproduce our results in [evaluation](https://github.com/fanqiwan/FuseAI/tree/main/FuseO1-Preview/evaluation).
199
+
200
+ The system prompt for evaluation is set to:
201
+
202
+ ```sh
203
+ You are a helpful and harmless assistant. You should think step-by-step.
204
+ ```
205
+
206
+ The evaluation results are shown in the table below:
207
+
208
+ | Models | GPQA-Diamond| MMLU-Pro | MMLU |
209
+ |:------ | --------------| ------------ | -------------- |
210
+ | OpenAI o1 | 75.7 | - | 91.8 |
211
+ | OpenAI o1-preview | 73.3 | - | 90.8 |
212
+ | OpenAI o1-mini | 60.0 | 80.3 | 85.2 |
213
+ | DeepSeek R1 | 71.5 | 84.0 | 90.8 |
214
+ | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 57.6 | 68.7 | 82.2 |
215
+ | [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) | 49.5 | 63.5 | 85.2 |
216
+ | [NovaSky-AI/Sky-T1-32B-Preview](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview) | 50.5 | 65.8 | 82.7 |
217
+ | [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 46.5 | 56.3 | 79.6 |
218
+ | [FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview) | 55.1 | 68.6 | 82.0 |
219
+ | [FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview) | 62.1 | 68.9 | 82.7 |
220
+ | [FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview) | 62.1 | 70.8 | 83.6 |
221
+
222
+ We show that our merged FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview demonstrate superior performance improvements comparet to DeepSeek-R1-Distill-Qwen-32B, QwQ-32B-Preview, and Sky-T1-32B-Preview on scientific reasoning. Specifically, our model achieves an accuracy of **62.1 on GPQA-Diamond and 70.8 on MMLU-Pro**, demonstrating significant performance improvements compared to DeepSeek-R1-Distill-Qwen-32B (57.6 on GPQA-Diamond and 68.7 on MMLU-Pro).
223
+
224
+
225
+ ## Code Reasoning
226
+
227
+ The evaluation code is modified from [Qwen2.5-Coder](https://github.com/QwenLM/Qwen2.5-Coder/tree/main/qwencoder-eval/reasoning/livecode_bench_cot). In our evaluation, we set the temperature to 0.6, the top-p to 0.95 and the max_tokens to 32768. We provide the example to reproduce our results in [code_evaluation](https://github.com/fanqiwan/FuseAI/tree/main/FuseO1-Preview/code_evaluation).
228
+
229
+ The system prompt for evaluation is set to:
230
+
231
+ ```sh
232
+ A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.
233
+ ```
234
+
235
+ In our evaluation of LiveCodeBench, we follow the method from DeepSeek-R1 and make a slight modification. The Pass@1 is computed by averaging the results across 16 sampled responses per prompt.
236
+
237
+ The evaluation results are shown in the table below:
238
+
239
+ | Models | LiveCodeBench | LiveCodeBench-Easy | LiveCodeBench-Medium | LiveCodeBench-Hard |
240
+ |:------ | --------------| ------------------- | ------------ | -------------- |
241
+ | OpenAI o1 | 63.4 | 98.5 | 80.9 | 31.7 |
242
+ | OpenAI o1-preview | 42.7 | 97.0 | 47.2 | 9.8 |
243
+ | OpenAI o1-mini | 52.00 | 91.0 | 67.4 | 19.5 |
244
+ | DeepSeek R1 | 62.8 | 98.4 | 78.3 | 32.2 |
245
+ | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 56.1 | 93.6 | 73.1 | 23.4 |
246
+ | [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) | 44.4 | 94.9 | 53.8 | 10.0 |
247
+ | [FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview) | 57.9 | 93.6 | 76.0 | 25.5 |
248
+
249
+ We show that our merged FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview demonstrate superior performance improvements comparet to DeepSeek-R1-Distill-Qwen-32B, QwQ-32B-Preview, and Sky-T1-32B-Preview on scientific reasoning. Specifically, our model achieves an accuracy of **57.9 on LiveCodeBench and 25.5 on LiveCodeBench-Hard**, demonstrating significant performance improvements compared to DeepSeek-R1-Distill-Qwen-32B (56.1 on LiveCodeBench and 23.4 on LiveCodeBench-Hard), OpenAI o1-preview (42.7 on LiveCodeBench and 9.8 on LiveCodeBench-Hard) and OpenAI o1-mini (52.0 on LiveCodeBench and 19.5 on LiveCodeBench-Hard Pass@1).
250
+
251
+ ## Future Works
252
+
253
+ This work is our first attempt effort to achieve knowledge fusion of System-II reasoning LLMs through a model merging approach, which is limited to LLMs with identical scale and architecture. In future work, we plan to employ our [explicit model fusion](https://arxiv.org/abs/2401.10491) method, based on multi-teacher knowledge distillation, and our [implici model fusion](https://arxiv.org/abs/2412.03187) method, which utilizes weighted-reward preference optimization for LLMs with different scales and architectures.
254
+ Furthermore, we intend to explore the combination of knowledge fusion with reinforcement learning (RL) methods, which have been demonstrated as the most effective approach for enhancing reasoning abilities. Stay tuned for the next version of FuseO1!
255
+
256
+ ## Citations
257
+
258
+ ```
259
+ @article{wan2024fusechat,
260
+ title={Fusechat: Knowledge fusion of chat models},
261
+ author={Wan, Fanqi and Zhong, Longguang and Yang, Ziyi and Chen, Ruijun and Quan, Xiaojun},
262
+ journal={arXiv preprint arXiv:2408.07990},
263
+ year={2024}
264
+ }
265
+ ```
config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 151643,
7
+ "eos_token_id": 151643,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 5120,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 27648,
12
+ "max_position_embeddings": 131072,
13
+ "max_window_layers": 64,
14
+ "model_type": "qwen2",
15
+ "num_attention_heads": 40,
16
+ "num_hidden_layers": 64,
17
+ "num_key_value_heads": 8,
18
+ "rms_norm_eps": 1e-05,
19
+ "rope_theta": 1000000.0,
20
+ "sliding_window": 131072,
21
+ "tie_word_embeddings": false,
22
+ "torch_dtype": "bfloat16",
23
+ "transformers_version": "4.43.1",
24
+ "use_cache": true,
25
+ "use_sliding_window": false,
26
+ "vocab_size": 152064,
27
+ "quantization_config": {
28
+ "quant_method": "exl2",
29
+ "version": "0.2.7",
30
+ "bits": 6.5,
31
+ "head_bits": 8,
32
+ "calibration": {
33
+ "rows": 115,
34
+ "length": 2048,
35
+ "dataset": "(default)"
36
+ }
37
+ }
38
+ }
measurement.json ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors.index.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"metadata": {"mergekit_version": "0.0.4.2", "total_size": 65527752704}, "weight_map": {"lm_head.weight": "model-00001-of-00014.safetensors", "model.embed_tokens.weight": "model-00001-of-00014.safetensors", "model.layers.0.input_layernorm.weight": "model-00001-of-00014.safetensors", "model.layers.0.mlp.down_proj.weight": "model-00001-of-00014.safetensors", "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00014.safetensors", "model.layers.0.mlp.up_proj.weight": "model-00001-of-00014.safetensors", "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00014.safetensors", "model.layers.1.input_layernorm.weight": "model-00001-of-00014.safetensors", "model.layers.1.mlp.down_proj.weight": "model-00001-of-00014.safetensors", "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00014.safetensors", "model.layers.1.mlp.up_proj.weight": "model-00001-of-00014.safetensors", "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00014.safetensors", "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00014.safetensors", "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00014.safetensors", "model.layers.1.self_attn.o_proj.weight": "model-00002-of-00014.safetensors", "model.layers.1.self_attn.q_proj.bias": "model-00002-of-00014.safetensors", "model.layers.1.self_attn.q_proj.weight": "model-00002-of-00014.safetensors", "model.layers.1.self_attn.v_proj.bias": "model-00002-of-00014.safetensors", "model.layers.1.self_attn.v_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.input_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.10.mlp.down_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.mlp.up_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.k_proj.bias": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.q_proj.bias": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.v_proj.bias": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.input_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.11.mlp.down_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.mlp.up_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.k_proj.bias": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.q_proj.bias": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.v_proj.bias": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.input_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.12.mlp.down_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.mlp.up_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.k_proj.bias": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.q_proj.bias": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.v_proj.bias": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.input_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.13.mlp.down_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.mlp.up_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.k_proj.bias": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.q_proj.bias": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.v_proj.bias": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.input_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.14.mlp.down_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.mlp.up_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.k_proj.bias": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.q_proj.bias": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.v_proj.bias": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00014.safetensors", "model.layers.15.input_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.15.mlp.down_proj.weight": "model-00003-of-00014.safetensors", "model.layers.15.mlp.gate_proj.weight": "model-00003-of-00014.safetensors", "model.layers.15.mlp.up_proj.weight": "model-00003-of-00014.safetensors", "model.layers.15.post_attention_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.k_proj.bias": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.k_proj.weight": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.o_proj.weight": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.q_proj.bias": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.q_proj.weight": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.v_proj.bias": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.v_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.input_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.16.mlp.down_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.mlp.gate_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.mlp.up_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.post_attention_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.k_proj.bias": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.k_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.o_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.q_proj.bias": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.q_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.v_proj.bias": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.v_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.input_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.17.mlp.down_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.mlp.gate_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.mlp.up_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.post_attention_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.k_proj.bias": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.o_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.q_proj.bias": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.q_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.v_proj.bias": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.input_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.18.mlp.down_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.mlp.gate_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.mlp.up_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.post_attention_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.k_proj.bias": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.k_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.q_proj.bias": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.q_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.v_proj.bias": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.v_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.input_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.19.mlp.down_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.mlp.up_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.k_proj.bias": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.q_proj.bias": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.v_proj.bias": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00014.safetensors", "model.layers.2.input_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.2.mlp.down_proj.weight": "model-00004-of-00014.safetensors", "model.layers.2.mlp.gate_proj.weight": "model-00004-of-00014.safetensors", "model.layers.2.mlp.up_proj.weight": "model-00004-of-00014.safetensors", "model.layers.2.post_attention_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.k_proj.bias": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.k_proj.weight": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.o_proj.weight": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.q_proj.bias": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.q_proj.weight": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.v_proj.bias": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.v_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.input_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.20.mlp.down_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.mlp.gate_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.mlp.up_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.post_attention_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.k_proj.bias": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.k_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.o_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.q_proj.bias": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.q_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.v_proj.bias": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.v_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.input_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.21.mlp.down_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.mlp.gate_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.mlp.up_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.post_attention_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.k_proj.bias": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.k_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.o_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.q_proj.bias": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.q_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.v_proj.bias": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.v_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.input_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.22.mlp.down_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.mlp.gate_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.mlp.up_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.post_attention_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.k_proj.bias": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.k_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.o_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.q_proj.bias": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.q_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.v_proj.bias": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.v_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.input_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.23.mlp.down_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.mlp.gate_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.mlp.up_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.post_attention_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.k_proj.bias": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.k_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.o_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.q_proj.bias": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.q_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.v_proj.bias": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.v_proj.weight": "model-00004-of-00014.safetensors", "model.layers.24.input_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.24.mlp.down_proj.weight": "model-00005-of-00014.safetensors", "model.layers.24.mlp.gate_proj.weight": "model-00005-of-00014.safetensors", "model.layers.24.mlp.up_proj.weight": "model-00005-of-00014.safetensors", "model.layers.24.post_attention_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.k_proj.bias": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.o_proj.weight": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.q_proj.bias": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.v_proj.bias": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.input_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.25.mlp.down_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.mlp.gate_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.mlp.up_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.post_attention_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.k_proj.bias": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.k_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.o_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.q_proj.bias": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.q_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.v_proj.bias": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.v_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.input_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.26.mlp.down_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.mlp.gate_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.mlp.up_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.post_attention_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.k_proj.bias": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.k_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.o_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.q_proj.bias": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.q_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.v_proj.bias": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.v_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.input_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.27.mlp.down_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.mlp.gate_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.mlp.up_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.post_attention_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.k_proj.bias": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.k_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.o_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.q_proj.bias": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.q_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.v_proj.bias": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.v_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.input_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.28.mlp.down_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.mlp.gate_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.mlp.up_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.post_attention_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.k_proj.bias": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.k_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.o_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.q_proj.bias": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.q_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.v_proj.bias": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.v_proj.weight": "model-00005-of-00014.safetensors", "model.layers.29.input_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.29.mlp.down_proj.weight": "model-00006-of-00014.safetensors", "model.layers.29.mlp.gate_proj.weight": "model-00006-of-00014.safetensors", "model.layers.29.mlp.up_proj.weight": "model-00006-of-00014.safetensors", "model.layers.29.post_attention_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.k_proj.bias": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.k_proj.weight": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.o_proj.weight": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.q_proj.bias": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.q_proj.weight": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.v_proj.bias": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.v_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.input_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.3.mlp.down_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.mlp.gate_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.mlp.up_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.post_attention_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.k_proj.bias": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.k_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.o_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.q_proj.bias": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.q_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.v_proj.bias": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.v_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.input_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.30.mlp.down_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.mlp.gate_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.mlp.up_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.post_attention_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.k_proj.bias": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.k_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.o_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.q_proj.bias": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.q_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.v_proj.bias": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.v_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.input_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.31.mlp.down_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.mlp.gate_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.mlp.up_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.post_attention_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.k_proj.bias": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.k_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.o_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.q_proj.bias": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.q_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.v_proj.bias": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.v_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.input_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.32.mlp.down_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.mlp.gate_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.mlp.up_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.post_attention_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.k_proj.bias": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.k_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.o_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.q_proj.bias": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.q_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.v_proj.bias": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.v_proj.weight": "model-00006-of-00014.safetensors", "model.layers.33.input_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.33.mlp.down_proj.weight": "model-00007-of-00014.safetensors", "model.layers.33.mlp.gate_proj.weight": "model-00007-of-00014.safetensors", "model.layers.33.mlp.up_proj.weight": "model-00007-of-00014.safetensors", "model.layers.33.post_attention_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.k_proj.bias": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.k_proj.weight": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.o_proj.weight": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.q_proj.bias": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.q_proj.weight": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.v_proj.bias": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.v_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.input_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.34.mlp.down_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.mlp.gate_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.mlp.up_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.post_attention_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.k_proj.bias": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.k_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.o_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.q_proj.bias": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.q_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.v_proj.bias": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.v_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.input_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.35.mlp.down_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.mlp.gate_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.mlp.up_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.post_attention_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.k_proj.bias": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.k_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.o_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.q_proj.bias": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.q_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.v_proj.bias": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.v_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.input_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.36.mlp.down_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.mlp.gate_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.mlp.up_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.post_attention_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.k_proj.bias": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.k_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.o_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.q_proj.bias": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.q_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.v_proj.bias": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.v_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.input_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.37.mlp.down_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.mlp.gate_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.mlp.up_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.post_attention_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.k_proj.bias": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.k_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.o_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.q_proj.bias": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.q_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.v_proj.bias": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.v_proj.weight": "model-00007-of-00014.safetensors", "model.layers.38.input_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.38.mlp.down_proj.weight": "model-00008-of-00014.safetensors", "model.layers.38.mlp.gate_proj.weight": "model-00008-of-00014.safetensors", "model.layers.38.mlp.up_proj.weight": "model-00008-of-00014.safetensors", "model.layers.38.post_attention_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.k_proj.bias": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.k_proj.weight": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.o_proj.weight": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.q_proj.bias": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.q_proj.weight": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.v_proj.bias": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.v_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.input_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.39.mlp.down_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.mlp.gate_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.mlp.up_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.post_attention_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.k_proj.bias": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.k_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.o_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.q_proj.bias": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.q_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.v_proj.bias": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.v_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.input_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.4.mlp.down_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.mlp.gate_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.mlp.up_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.post_attention_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.k_proj.bias": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.k_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.o_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.q_proj.bias": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.q_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.v_proj.bias": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.v_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.input_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.40.mlp.down_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.mlp.gate_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.mlp.up_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.post_attention_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.k_proj.bias": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.k_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.o_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.q_proj.bias": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.q_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.v_proj.bias": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.v_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.input_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.41.mlp.down_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.mlp.gate_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.mlp.up_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.post_attention_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.k_proj.bias": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.k_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.o_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.q_proj.bias": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.q_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.v_proj.bias": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.v_proj.weight": "model-00008-of-00014.safetensors", "model.layers.42.input_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.42.mlp.down_proj.weight": "model-00009-of-00014.safetensors", "model.layers.42.mlp.gate_proj.weight": "model-00009-of-00014.safetensors", "model.layers.42.mlp.up_proj.weight": "model-00009-of-00014.safetensors", "model.layers.42.post_attention_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.k_proj.bias": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.k_proj.weight": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.o_proj.weight": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.q_proj.bias": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.q_proj.weight": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.v_proj.bias": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.v_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.input_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.43.mlp.down_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.mlp.gate_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.mlp.up_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.post_attention_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.k_proj.bias": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.k_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.o_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.q_proj.bias": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.q_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.v_proj.bias": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.v_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.input_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.44.mlp.down_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.mlp.gate_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.mlp.up_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.post_attention_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.k_proj.bias": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.k_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.o_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.q_proj.bias": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.q_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.v_proj.bias": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.v_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.input_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.45.mlp.down_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.mlp.gate_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.mlp.up_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.post_attention_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.k_proj.bias": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.k_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.o_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.q_proj.bias": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.q_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.v_proj.bias": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.v_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.input_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.46.mlp.down_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.mlp.gate_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.mlp.up_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.post_attention_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.k_proj.bias": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.k_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.o_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.q_proj.bias": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.q_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.v_proj.bias": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.v_proj.weight": "model-00009-of-00014.safetensors", "model.layers.47.input_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.47.mlp.down_proj.weight": "model-00010-of-00014.safetensors", "model.layers.47.mlp.gate_proj.weight": "model-00010-of-00014.safetensors", "model.layers.47.mlp.up_proj.weight": "model-00010-of-00014.safetensors", "model.layers.47.post_attention_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.k_proj.bias": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.k_proj.weight": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.o_proj.weight": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.q_proj.bias": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.q_proj.weight": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.v_proj.bias": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.v_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.input_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.48.mlp.down_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.mlp.gate_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.mlp.up_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.post_attention_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.k_proj.bias": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.k_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.o_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.q_proj.bias": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.q_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.v_proj.bias": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.v_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.input_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.49.mlp.down_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.mlp.gate_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.mlp.up_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.post_attention_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.k_proj.bias": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.k_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.o_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.q_proj.bias": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.q_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.v_proj.bias": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.v_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.input_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.5.mlp.down_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.mlp.gate_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.mlp.up_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.post_attention_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.k_proj.bias": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.k_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.o_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.q_proj.bias": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.q_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.v_proj.bias": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.v_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.input_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.50.mlp.down_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.mlp.gate_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.mlp.up_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.post_attention_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.k_proj.bias": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.k_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.o_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.q_proj.bias": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.q_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.v_proj.bias": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.v_proj.weight": "model-00010-of-00014.safetensors", "model.layers.51.input_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.51.mlp.down_proj.weight": "model-00011-of-00014.safetensors", "model.layers.51.mlp.gate_proj.weight": "model-00011-of-00014.safetensors", "model.layers.51.mlp.up_proj.weight": "model-00011-of-00014.safetensors", "model.layers.51.post_attention_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.k_proj.bias": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.k_proj.weight": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.o_proj.weight": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.q_proj.bias": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.q_proj.weight": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.v_proj.bias": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.v_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.input_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.52.mlp.down_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.mlp.gate_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.mlp.up_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.post_attention_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.k_proj.bias": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.k_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.o_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.q_proj.bias": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.q_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.v_proj.bias": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.v_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.input_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.53.mlp.down_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.mlp.gate_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.mlp.up_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.post_attention_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.k_proj.bias": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.k_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.o_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.q_proj.bias": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.q_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.v_proj.bias": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.v_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.input_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.54.mlp.down_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.mlp.gate_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.mlp.up_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.post_attention_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.k_proj.bias": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.k_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.o_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.q_proj.bias": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.q_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.v_proj.bias": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.v_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.input_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.55.mlp.down_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.mlp.gate_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.mlp.up_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.post_attention_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.k_proj.bias": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.k_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.o_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.q_proj.bias": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.q_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.v_proj.bias": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.v_proj.weight": "model-00011-of-00014.safetensors", "model.layers.56.input_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.56.mlp.down_proj.weight": "model-00012-of-00014.safetensors", "model.layers.56.mlp.gate_proj.weight": "model-00012-of-00014.safetensors", "model.layers.56.mlp.up_proj.weight": "model-00012-of-00014.safetensors", "model.layers.56.post_attention_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.k_proj.bias": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.k_proj.weight": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.o_proj.weight": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.q_proj.bias": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.q_proj.weight": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.v_proj.bias": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.v_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.input_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.57.mlp.down_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.mlp.gate_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.mlp.up_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.post_attention_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.k_proj.bias": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.k_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.o_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.q_proj.bias": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.q_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.v_proj.bias": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.v_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.input_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.58.mlp.down_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.mlp.gate_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.mlp.up_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.post_attention_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.k_proj.bias": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.k_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.o_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.q_proj.bias": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.q_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.v_proj.bias": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.v_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.input_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.59.mlp.down_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.mlp.gate_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.mlp.up_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.post_attention_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.k_proj.bias": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.k_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.o_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.q_proj.bias": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.q_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.v_proj.bias": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.v_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.input_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.6.mlp.down_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.mlp.gate_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.mlp.up_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.post_attention_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.k_proj.bias": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.k_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.o_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.q_proj.bias": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.q_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.v_proj.bias": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.v_proj.weight": "model-00012-of-00014.safetensors", "model.layers.60.input_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.60.mlp.down_proj.weight": "model-00013-of-00014.safetensors", "model.layers.60.mlp.gate_proj.weight": "model-00013-of-00014.safetensors", "model.layers.60.mlp.up_proj.weight": "model-00013-of-00014.safetensors", "model.layers.60.post_attention_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.k_proj.bias": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.k_proj.weight": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.o_proj.weight": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.q_proj.bias": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.q_proj.weight": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.v_proj.bias": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.v_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.input_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.61.mlp.down_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.mlp.gate_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.mlp.up_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.post_attention_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.k_proj.bias": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.k_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.o_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.q_proj.bias": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.q_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.v_proj.bias": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.v_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.input_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.62.mlp.down_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.mlp.gate_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.mlp.up_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.post_attention_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.k_proj.bias": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.k_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.o_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.q_proj.bias": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.q_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.v_proj.bias": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.v_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.input_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.63.mlp.down_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.mlp.gate_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.mlp.up_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.post_attention_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.k_proj.bias": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.k_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.o_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.q_proj.bias": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.q_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.v_proj.bias": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.v_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.input_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.7.mlp.down_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.mlp.gate_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.mlp.up_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.post_attention_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.k_proj.bias": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.k_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.o_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.q_proj.bias": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.q_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.v_proj.bias": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.v_proj.weight": "model-00013-of-00014.safetensors", "model.layers.8.input_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.8.mlp.down_proj.weight": "model-00014-of-00014.safetensors", "model.layers.8.mlp.gate_proj.weight": "model-00014-of-00014.safetensors", "model.layers.8.mlp.up_proj.weight": "model-00014-of-00014.safetensors", "model.layers.8.post_attention_layernorm.weight": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.k_proj.bias": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.k_proj.weight": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.o_proj.weight": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.q_proj.bias": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.q_proj.weight": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.v_proj.bias": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.v_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.input_layernorm.weight": "model-00014-of-00014.safetensors", "model.layers.9.mlp.down_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.mlp.gate_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.mlp.up_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.post_attention_layernorm.weight": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.k_proj.bias": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.k_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.o_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.q_proj.bias": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.q_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.v_proj.bias": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.v_proj.weight": "model-00014-of-00014.safetensors", "model.norm.weight": "model-00014-of-00014.safetensors"}}
output-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50579cca7fa924747033c5a96e293e167c4f7a7b3aa7c528d1bb47d6a6571b77
3
+ size 8576368072
output-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eed6adf8366c792d0498cab3fd686ee9d69ef3a49df31d0d9f6c6a512b67be17
3
+ size 8519761994
output-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8274aee77df4d583ef4a950560fd1b1561b85831c13d8bb6f97aac9c1ae084f1
3
+ size 8569755012
output-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f3f6bb5aa063ccdef9290cd7535d2ac6943a955b00fd94720f8bdb299b9027e7
3
+ size 2022815760
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_eos_token": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "<|begin▁of▁sentence|>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "clean_up_tokenization_spaces": false,
13
+ "eos_token": {
14
+ "__type": "AddedToken",
15
+ "content": "<|end▁of▁sentence|>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "legacy": true,
22
+ "model_max_length": 16384,
23
+ "pad_token": {
24
+ "__type": "AddedToken",
25
+ "content": "<|end▁of▁sentence|>",
26
+ "lstrip": false,
27
+ "normalized": true,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ },
31
+ "sp_model_kwargs": {},
32
+ "unk_token": null,
33
+ "tokenizer_class": "LlamaTokenizerFast",
34
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}"
35
+ }