bartowski commited on
Commit
64e5391
·
1 Parent(s): 0517b31

Quant for 4.0

Browse files
LICENSE ADDED
@@ -0,0 +1,320 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Yi Series Models License Agreement
2
+ Version: 2.0
3
+ Date of Release: November 4, 2023
4
+
5
+ 1. Definition
6
+
7
+ “Agreement” refers to the terms and conditions defined in this Yi Series Models
8
+ License Agreement for the use, reproduction and distribution of Yi Series
9
+ Models.
10
+
11
+ “Model” refers to associated components (including checkpoints) developed based
12
+ on machine learning, including learned weights and parameters (including the
13
+ status of optimizer).
14
+
15
+ “Yi Series Models” refers to opensource models with different specifications and
16
+ capabilities named “Yi” provided by the Licensor, including Yi-6B, Yi-34B etc.
17
+
18
+ “Derivatives” refers to all modifications to Yi Series Models, work based on Yi
19
+ Series Models, or any other models created or initialized by transferring the
20
+ weights, parameters, activations, or output patterns of Yi Series Models to
21
+ other models to achieve similar performance, including but not limited to
22
+ methods that require using intermediate data representations or generating
23
+ synthetic data based on Yi Series Models to train other models.
24
+
25
+ “Licensor” refers to Beijing Lingyiwanwu Information Technology Co., Ltd.
26
+
27
+ “you” refers to an individual or legal entity that exercises the license granted
28
+ by this Agreement and/or uses the Yi Series Models for any purpose and in any
29
+ field of use.
30
+
31
+ “Third Party” refers to any individuals, legal entities or non-legal
32
+ organizations other than you.
33
+
34
+ “Distribute” refers to transmitting, copying, publishing, or otherwise sharing
35
+ the Yi Series Models with third parties, including providing the Yi Series
36
+ Models through electronic or other remote means (such as any SaaS software or
37
+ PaaS software accessed via API or web access).
38
+
39
+ “Commercial Purposes” refers to the use of the Yi Series Models, directly or
40
+ indirectly, for the operation, promotion, revenue generation, or any other
41
+ profit-making purposes for entities or individuals.
42
+
43
+ “Laws and Regulations” refers to the laws and administrative regulations of the
44
+ mainland of the People's Republic of China (for the purposes of this Agreement
45
+ only, excluding Hong Kong, Macau, and Taiwan).
46
+
47
+ “Personal Information” refers to various information related to identified or
48
+ identifiable natural persons recorded electronically or by other means,
49
+ excluding information that has been anonymized.
50
+
51
+ “Logo” refers to any trademark, service mark, trade name, domain name, website
52
+ name, or other distinctive branding marks.
53
+
54
+ 2. License and License Restrictions
55
+ The Licensor hereby grants you a non-exclusive, global, non-transferable,
56
+ non-sub-licensable, revocable, and royalty-free copyright license. You must
57
+ adhere to the following license restrictions:
58
+
59
+ 1) Your use of the Yi Series Models must comply with the Laws and Regulations as
60
+ well as applicable legal requirements of other countries/regions, and respect
61
+ social ethics and moral standards, including but not limited to, not using the
62
+ Yi Series Models for purposes prohibited by Laws and Regulations as well as
63
+ applicable legal requirements of other countries/regions, such as harming
64
+ national security, promoting terrorism, extremism, inciting ethnic or racial
65
+ hatred, discrimination, violence, or pornography, and spreading false harmful
66
+ information.
67
+
68
+ 2) You shall not, for military or unlawful purposes or in ways not allowed by
69
+ Laws and Regulations as well as applicable legal requirements of other
70
+ countries/regions, a) use, copy or Distribute the Yi Series Models, or b) create
71
+ complete or partial Derivatives of the Yi Series Models.
72
+
73
+ 3) Your use of the Yi Series Models (including using the output of the Yi Series
74
+ Models) and the creation of Derivatives must not infringe upon the legitimate
75
+ rights of any Third Party, including but not limited to the rights of personal
76
+ rights such as the right to likeness, reputation, and privacy, as well as
77
+ intellectual property rights such as copyrights, patents, trade secrets, and
78
+ other property rights.
79
+
80
+ 4) You must clearly attribute the source of the Yi Series Models to the Licensor
81
+ and provide a copy of this Agreement to any Third-Party users of the Yi Series
82
+ Models and Derivatives.
83
+
84
+ 5) If you modify the Yi Series Models to create Derivatives, you must clearly
85
+ indicate the substantial modifications made, and these modifications shall not
86
+ violate the license restrictions of this Agreement. You shall not enable,
87
+ assist, or in any way facilitate Third Parties to violate the license
88
+ restrictions of this Agreement.
89
+
90
+ If you plan to use the Yi Series Models and Derivatives for Commercial Purposes,
91
+ you should contact the Licensor in advance as specified in Section 7 of this
92
+ Agreement named "Updates to the Agreement and Contact Information" and obtain
93
+ written authorization from the Licensor. When you obtain authorization from the
94
+ Licensor to use the Yi Series Models and Derivatives for Commercial Purposes,
95
+ you must comply with the afore-mentioned license restrictions.
96
+
97
+
98
+ 3. Intellectual Property
99
+ The ownership of the Yi Series Models and their related intellectual property
100
+ rights is solely held by the Licensor.
101
+
102
+ In any circumstance, without the prior written consent of the Licensor, you are
103
+ not allowed to use any Logo associated with the Licensor. If your use of
104
+ Licensor's Logo in violation of this Agreement causes any losses to the Licensor
105
+ or others, you will bear full legal responsibility.
106
+
107
+
108
+ 4. Disclaimer and Limitation of Liability
109
+
110
+ The Yi Series Models are provided "AS IS." The Licensor does not provide any
111
+ express or implied warranties for the Yi Series Models, including but not
112
+ limited to stability, ownership, merchantability, non-infringement, or fitness
113
+ for a specific purpose of the Yi Series Models and their output results. You
114
+ assume all responsibilities for the risks and consequences arising from the use,
115
+ reproduction, distribution of the Yi Series Models, and the creation of
116
+ Derivatives.
117
+
118
+ The Licensor complies with Laws and Regulations at all stages of model training,
119
+ maintaining the legality, authenticity, accuracy, objectivity, and diversity of
120
+ data and algorithms. The Licensor is not liable for any direct, indirect,
121
+ incidental consequences, and other losses or damages related to your use,
122
+ reproduction, and distribution of the Yi Series Models, and the creation of
123
+ Derivatives under this Agreement. This includes but is not limited to:
124
+
125
+ 1) The Licensor is not responsible for data security risks resulting from your
126
+ use of the Yi Series Models.
127
+
128
+ 2) The Yi Series Models may contain Personal Information. When you use Yi Series
129
+ Models, you acknowledge that you are the data processing entity as defined under
130
+ the Laws and Regulations responsible for determining the processing methods and
131
+ purposes of Personal Information. You must comply with legal requirements for
132
+ processing any Personal Information that may be contained in the Yi Series
133
+ Models and assume the associated legal responsibilities, as well as the risks
134
+ and consequences of processing Personal Information.
135
+
136
+ 3) The Licensor is not liable for reputation risks arising from your use of the
137
+ Yi Series Models or the output results of the Yi Series Models.
138
+
139
+ 4) The Licensor is not liable for intellectual property risks associated with
140
+ your use of the Yi Series Models’ output results.
141
+
142
+ If your use, reproduction, distribution of the Yi Series Models, or the creation
143
+ of Derivatives result in losses to the Licensor, the Licensor has the right to
144
+ seek compensation from you. For any claims made by Third Parties against the
145
+ Licensor related to your use, reproduction, and distribution of the Yi Series
146
+ Models, or the creation of Derivatives, the Licensor has the right to demand
147
+ that you defend, compensate, and indemnify the Licensor and protect the Licensor
148
+ from harm.
149
+
150
+
151
+ 5. Dispute Resolution
152
+
153
+ The stipulation, effectiveness, interpretation, performance, modification, and
154
+ termination of the Agreement, the use, copy and Distribute of the Yi Series
155
+ Models, and dispute resolution associated with your use, copy and distribution
156
+ shall be governed by the laws of the mainland of the People's Republic of China
157
+ (for the purposes of this agreement only, excluding Hong Kong, Macau, and
158
+ Taiwan), and the application of conflict of laws is excluded.
159
+
160
+ Any disputes arising from the use, copy or distribution of the Yi Series Models
161
+ should first be resolved through amicable negotiations. If negotiations fail,
162
+ legal proceedings should be initiated in the People's Court at the location of
163
+ the Licensor.
164
+
165
+
166
+ 6. Effectiveness and Termination of the Agreement
167
+
168
+ Your use of the Yi Series Models signifies that you have read and agreed to be
169
+ bound by the terms of the Agreement. The Agreement becomes effective from the
170
+ date of your use of the Yi Series Models and will terminate from the date you
171
+ cease using the Yi Series Models. If you violate any terms or restrictions in
172
+ the Agreement, the Licensor reserves the right to terminate the Agreement.
173
+
174
+ Upon termination of the Agreement, you must immediately cease using the Yi
175
+ Series Models. Section 4, "Disclaimer and Limitation of Liability," and Section
176
+ 5, "Dispute Resolution," of this Agreement remain in effect after the
177
+ termination of this Agreement.
178
+
179
+
180
+ 7. Updates to the Agreement and Contact Information
181
+
182
+ The Licensor reserves the right to update the Agreement from time to time. The
183
+ latest version of the Agreement will be posted by the Licensor through
184
+ https://01.ai.
185
+
186
+ For any questions related to licensing and copyright, please contact the
187
+ Licensor at yi@01.ai.
188
+
189
+
190
+ Yi系列模型许可协议
191
+ 版本: 2.0
192
+ 发布日期: 2023年11月4日
193
+
194
+ 1. 定义
195
+
196
+ “协议”是指本协议中定义Yi系列模型使用、复制和分发的条款和条件。
197
+
198
+ “模型”是指任何附带的基于机器学习的组件(包括检查点),包括学习的权重、参数(包括优
199
+ 化器状态)。
200
+
201
+ “Yi系列模型”是指许可方开源的以Yi命名的不同规格、不同能力的模型,包括
202
+ Yi-6B、Yi-34B等。
203
+
204
+ “模型衍生品”是指对Yi系列模型的所有修改、基于Yi系列模型的工作,或通过将Yi系列模型
205
+ 的权重、参数、激活或输出模式转移到其他模型而创建或初始化的任何其他模型,以使其他
206
+ 模型的性能与Yi系列模型类似,包括但不限于需要使用中间数据表示的提取方法或基于Yi系
207
+ 列模型生成合成数据来训练其他模型的方法。
208
+
209
+ “许可方”是指北京零一万物信息技术有限公司。
210
+
211
+ “您”是指行使本协议授予的权限和/或出于任何目的和在任何使用领域使用Yi系列模型的个
212
+ 人或法人实体。
213
+
214
+ “第三方”是指您之外的任何个人、法人实体或非法人组织。
215
+
216
+ “分发”是指向第三方传输、复制、发布或以其他方式共享Yi系列模型,包括将Yi系列模型作
217
+ 为通过电子或其他远程方式(例如基于 API 或 Web 访问的任何 SaaS 软件或 PaaS 软
218
+ 件)。
219
+
220
+ “商业用途”是指使用Yi系列模型,直接或间接为实体或个人进行运营、推广或产生收入,或
221
+ 用于任何其他盈利目的。
222
+
223
+ “法律法规”是指中华人民共和国大陆地区(仅为本协议之目的,不包括香港、澳门和台湾)
224
+ 的法律及行政法规。
225
+
226
+ “个人信息”是指以电子或者其他方式记录的与已识别或者可识别的自然人有关的各种信息,
227
+ 不包括匿名化处理后的信息。
228
+
229
+ “标识” 是指任何商标、服务标记、商号、域名、网站名称或其他带有显著品牌特征的标
230
+ 记。
231
+
232
+
233
+ 2. 许可及许可限制
234
+
235
+ 许可方特此授予您非排他性、全球性、不可转让、不可再许可、可撤销、免版税的版权许
236
+ 可。您必须满足如下许可限制条件:
237
+
238
+ 1) 您对Yi系列模型的使用应遵守法律法规以及其他国家/地区适用的法律要求、尊重社会公
239
+ 德和伦理道德。包括但不限于您不得将Yi系列模型用作危害国家安全、宣扬恐怖主义、极端
240
+ 主义,宣扬民族及种族仇恨、歧视,暴力、色情,以及虚假有害信息等法律法规以及其他国
241
+ 家/地区适用的法律要求禁止的目的。
242
+
243
+ 2) 您不得出于军事或非法目的,或以法律法规以及其他国家/地区适用的法律要求所不允许
244
+ 的方式a) 使用、复制、或分发Yi系列模型; 或b) 创建Yi系列模型的全部或部分衍生品。
245
+
246
+ 3) 您对Yi系列模型的使用(包括使用Yi系列模型的输出)以及模型衍生品的创建不得侵犯
247
+ 任何第三方的合法权益,包括但不限于他人肖像权、名誉权、隐私权等人格权,著作权、专
248
+ 利权、商业秘密等知识产权,或其他财产权益。
249
+
250
+ 4) 您必须向Yi系列模型及Yi系列模型衍生品的任何第三方使用者明确Yi系列模型的来源为
251
+ 许可方并向其提供本协议的副本。
252
+
253
+ 5) 若您修改Yi系列模型得到模型衍生品,您必须以显著的方式说明修改的内容,且上述修
254
+ 改不得违反本协议的许可限制条件,也不能允许、协助或以其他方式使得第三方违反本协议
255
+ 中的许可限制条件。
256
+
257
+ 如果您计划将 Yi系列模型及模型衍生品用作商业用途,您应当事先通过第7款“协议更新及
258
+ 联系方式”中的方式联系许可方进行登记并获得许可方的书面授权。若您取得许可方授权将
259
+ Yi系列模型及模型衍生品用作商业用途时,您应满足许可方上述许可限制条件。
260
+
261
+
262
+ 3. 知识产权
263
+
264
+ Yi系列模型的所有权及其相关知识产权,由许可方单独所有。
265
+
266
+ 在任何情况下,未经许可方事先书面同意,您不得以任何方式使用许可方的任何标识。由于
267
+ 您违反本协议使用许可方的标识给许可方或他人造成损失的,由您承担全部法律责任。
268
+
269
+
270
+ 4. 免责声明及责任限制
271
+
272
+ Yi系列模型按“原样”提供。许可方不对Yi系列模型提供任何明示或暗示的保证,包括但不限
273
+ 于:模型及输出结果的稳定性、所有权、适销性、非侵权性、或特定用途适用性。您将对适
274
+ 用、复制及分发Yi系列模型以及创建模型衍生品所产生的风险与后果承担所有责任。
275
+
276
+ 许可方在模型训练的所有阶段都遵守法律法规,坚持维护数据和算法的合法、真实、准确、
277
+ 客观和多样性。许可方不对您根据本协议使用、复制及分发Yi系列模型,以及创建模型衍生
278
+ 品而产生或与之相关的任何直接、间接、附带的后果、以及其他损失或损害承担责任。包括
279
+ 但不限于:
280
+
281
+ 1) 许可方不承担您因使用Yi系列模型而导致的数据安全风险。
282
+
283
+ 2) Yi系列模型中可能包含个人信息。在您使用Yi系列模型的过程中,您承认您为法律法规
284
+ 定义下决定个人信息处理方式和目的的个人信息处理者。您应遵守法律法规要求处理Yi系列
285
+ 模型中可能包含的个人信息,并承担相应的法律责任,以及处理个人信息的风险和后果。
286
+
287
+ 3) 许可方不承担您使用Yi系列模型或模型输出结果而产生的声誉风险。
288
+
289
+ 4) 许可方不承担您使用Yi系列模型的输出结果涉及的知识产权风险。
290
+
291
+ 若由于您对Yi系列模型的使用、复制或分发,或者创建模型衍生品而导致许可方遭受损失,
292
+ 许可方有权要求您对许可方的损失进行赔偿。对于任何第三方向许可方提出的因您使用、复
293
+ 制或分发Yi系列模型或创建模型衍生品行为的相关索赔,许可方有权要求您为许可方进行辩
294
+ 护、赔偿并使许可方免受损害。
295
+
296
+
297
+ 5. 争议解决
298
+
299
+ 协议的订立、效力、解释、履行、修改和终止,使用、复制和分发Yi系列模型以及争议解决
300
+ 均适用中华人民共和国大陆地区(仅为本协议之目的,不包括香港、澳门和台湾)法律,并
301
+ 排除冲突法的适用。
302
+
303
+ 因使用、复制和分发Yi系列模型而发生的任何争议,各方应首先通过友好协商的方式加以解
304
+ 决。协商不成时,应向许可方所在地人民法院提起诉讼。
305
+
306
+
307
+ 6. 协议的生效及终止
308
+
309
+ 您使用Yi系列模型即表示您已阅读并同意接受协议的约束。协议自您使用Yi系列模型之日起
310
+ 生效并将在您停止使用Yi系列模型之日起终止。若您违反协议中的任何条款或限制,许可方
311
+ 有权终止协议。
312
+
313
+ 若协议终止,您需立即停止使用Yi系列模型。本协议第4条“免责声明及责任限制”及第5条
314
+ “争议解决”在协议终止后仍有效。
315
+
316
+
317
+ 7. 协议更新及联系方式
318
+
319
+ 许可方有权对协议进行不时更新。许可方将通过https://01.ai公布协议最新版本。有关许
320
+ 可和版权的任何问题,请通过yi@01.ai 与许可方联系。
README.md CHANGED
@@ -8,30 +8,20 @@ pipeline_tag: text-generation
8
  quantized_by: bartowski
9
  ---
10
 
11
- ## Exllama v2 Quantizations of Yi-23B-Llama
12
 
13
  Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.0.8">turboderp's ExLlamaV2 v0.0.8</a> for quantization.
14
 
15
- Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.
16
-
17
  Conversion was done using wikitext-103-raw-v1-test.parquet as calibration dataset.
18
 
19
- Default arguments used except when the bits per weight is above 6.0, at that point the lm_head layer is quantized at 8 bits per weight instead of the default 6.
20
-
21
  Original model: https://huggingface.co/ByteWave/Yi-23B-Llama
22
-
23
- <a href="https://huggingface.co/bartowski/Yi-23B-Llama-exl2/tree/5_0">5.0 bits per weight</a>
24
-
25
- <a href="https://huggingface.co/bartowski/Yi-23B-Llama-exl2/tree/6_0">6.0 bits per weight</a>
26
-
27
- <a href="https://huggingface.co/bartowski/Yi-23B-Llama-exl2/tree/8_0">8.0 bits per weight</a>
28
 
29
  ## Download instructions
30
 
31
  With git:
32
 
33
  ```shell
34
- git clone --single-branch --branch 4_0 https://huggingface.co/bartowski/Yi-23B-Llama-exl2
35
  ```
36
 
37
  With huggingface hub (credit to TheBloke for instructions):
@@ -40,13 +30,6 @@ With huggingface hub (credit to TheBloke for instructions):
40
  pip3 install huggingface-hub
41
  ```
42
 
43
- To download the `main` (only useful if you only care about measurement.json) branch to a folder called `Yi-23B-Llama-exl2`:
44
-
45
- ```shell
46
- mkdir Yi-23B-Llama-exl2
47
- huggingface-cli download bartowski/Yi-23B-Llama-exl2 --local-dir Yi-23B-Llama-exl2 --local-dir-use-symlinks False
48
- ```
49
-
50
  To download from a different branch, add the `--revision` parameter:
51
 
52
  ```shell
 
8
  quantized_by: bartowski
9
  ---
10
 
11
+ # Exllama v2 Quantizations of Yi-23B-Llama at 4.0 bits per weight
12
 
13
  Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.0.8">turboderp's ExLlamaV2 v0.0.8</a> for quantization.
14
 
 
 
15
  Conversion was done using wikitext-103-raw-v1-test.parquet as calibration dataset.
16
 
 
 
17
  Original model: https://huggingface.co/ByteWave/Yi-23B-Llama
 
 
 
 
 
 
18
 
19
  ## Download instructions
20
 
21
  With git:
22
 
23
  ```shell
24
+ git clone --single-branch --branch 4.0 https://huggingface.co/bartowski/Yi-23B-Llama-exl2
25
  ```
26
 
27
  With huggingface hub (credit to TheBloke for instructions):
 
30
  pip3 install huggingface-hub
31
  ```
32
 
 
 
 
 
 
 
 
33
  To download from a different branch, add the `--revision` parameter:
34
 
35
  ```shell
Yi.svg ADDED
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "distil-yi-llama",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 7168,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 20480,
13
+ "max_position_embeddings": 4096,
14
+ "model_type": "llama",
15
+ "num_attention_heads": 56,
16
+ "num_hidden_layers": 40,
17
+ "num_key_value_heads": 8,
18
+ "pad_token_id": 0,
19
+ "pretraining_tp": 1,
20
+ "rms_norm_eps": 1e-05,
21
+ "rope_scaling": null,
22
+ "rope_theta": 5000000.0,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "bfloat16",
25
+ "transformers_version": "4.35.2",
26
+ "use_cache": true,
27
+ "vocab_size": 64000
28
+ }
model.safetensors.index.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"metadata": {"mergekit_version": "0.0.2.2"}, "weight_map": {"model.embed_tokens.weight": "model-00001-of-00005.safetensors", "model.layers.0.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.0.mlp.down_proj.weight": "model-00001-of-00005.safetensors", "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00005.safetensors", "model.layers.0.mlp.up_proj.weight": "model-00001-of-00005.safetensors", "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00005.safetensors", "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00005.safetensors", "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00005.safetensors", "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.1.mlp.down_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.mlp.up_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00005.safetensors", "model.layers.2.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.2.mlp.down_proj.weight": "model-00001-of-00005.safetensors", "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00005.safetensors", "model.layers.2.mlp.up_proj.weight": "model-00001-of-00005.safetensors", "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00005.safetensors", "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00005.safetensors", "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00005.safetensors", "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00005.safetensors", "model.layers.3.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.3.mlp.down_proj.weight": "model-00001-of-00005.safetensors", "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00005.safetensors", "model.layers.3.mlp.up_proj.weight": "model-00001-of-00005.safetensors", "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00005.safetensors", "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00005.safetensors", "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00005.safetensors", "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00005.safetensors", "model.layers.4.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.4.mlp.down_proj.weight": "model-00001-of-00005.safetensors", "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00005.safetensors", "model.layers.4.mlp.up_proj.weight": "model-00001-of-00005.safetensors", "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00005.safetensors", "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00005.safetensors", "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00005.safetensors", "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00005.safetensors", "model.layers.5.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.5.mlp.down_proj.weight": "model-00001-of-00005.safetensors", "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00005.safetensors", "model.layers.5.mlp.up_proj.weight": "model-00001-of-00005.safetensors", "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00005.safetensors", "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00005.safetensors", "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00005.safetensors", "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00005.safetensors", "model.layers.6.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.6.mlp.down_proj.weight": "model-00001-of-00005.safetensors", "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00005.safetensors", "model.layers.6.mlp.up_proj.weight": "model-00001-of-00005.safetensors", "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00005.safetensors", "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00005.safetensors", "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00005.safetensors", "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00005.safetensors", "model.layers.7.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.7.mlp.down_proj.weight": "model-00001-of-00005.safetensors", "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00005.safetensors", "model.layers.7.mlp.up_proj.weight": "model-00001-of-00005.safetensors", "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00005.safetensors", "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00005.safetensors", "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00005.safetensors", "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00005.safetensors", "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00005.safetensors", "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00005.safetensors", "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00005.safetensors", "model.layers.10.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.10.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.10.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.11.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.11.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.11.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.12.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.12.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.12.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.13.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.14.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.15.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.16.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.8.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.8.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.8.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.9.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.9.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.9.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.17.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.17.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.17.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.17.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.17.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.17.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.17.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.18.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.18.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.18.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.18.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.18.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.18.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.18.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.18.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.19.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.19.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.19.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.20.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.20.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.20.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.21.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.22.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.23.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.24.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.25.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.25.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.26.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.26.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.26.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.26.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.26.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.26.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.26.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.26.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.26.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.27.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.27.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.27.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.27.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.27.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.27.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.27.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.27.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.27.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.28.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.28.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.28.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.28.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.28.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.28.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.28.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.28.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.28.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.29.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.29.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.29.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.29.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.29.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.29.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.29.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.29.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.29.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.30.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.30.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.31.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.31.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.32.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.32.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.33.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.33.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.34.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.34.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.34.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.34.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.34.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.34.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.34.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.34.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.34.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.35.input_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.35.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.35.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.35.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.35.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.35.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.35.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.35.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.35.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.36.input_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.36.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.36.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.36.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.36.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.36.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.36.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.36.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.36.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.37.input_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.37.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.37.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.37.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.37.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.37.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.37.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.37.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.37.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.input_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.38.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.38.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.input_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.39.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.39.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "lm_head.weight": "model-00005-of-00005.safetensors", "model.norm.weight": "model-00005-of-00005.safetensors"}}
output-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c52a70bc8f608e8f3a843ae34c25381ca054af9b554d5977b725e8388d8aa937
3
+ size 8580444224
output-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca86c9220f7e2b4d98adaf69e9d311669681c7965573d8b1aa222fb1c181fb2d
3
+ size 3871968000
tokenization_yi.py ADDED
@@ -0,0 +1,246 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from shutil import copyfile
3
+ from typing import Any, Dict, List, Optional, Tuple
4
+
5
+ import sentencepiece as spm
6
+ from transformers.tokenization_utils import AddedToken, PreTrainedTokenizer
7
+ from transformers.utils import logging
8
+
9
+ logger = logging.get_logger(__name__)
10
+
11
+ VOCAB_FILES_NAMES = {"vocab_file": "tokenizer.model"}
12
+
13
+ PRETRAINED_VOCAB_FILES_MAP = {
14
+ "vocab_file": {},
15
+ "tokenizer_file": {},
16
+ }
17
+ PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {}
18
+
19
+
20
+ class YiTokenizer(PreTrainedTokenizer):
21
+ """
22
+ Construct a Yi tokenizer. Based on byte-level Byte-Pair-Encoding.
23
+ Args:
24
+ vocab_file (`str`):
25
+ Path to the vocabulary file.
26
+ """
27
+
28
+ vocab_files_names = VOCAB_FILES_NAMES
29
+ pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
30
+ max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
31
+ model_input_names = ["input_ids", "attention_mask"]
32
+
33
+ def __init__(
34
+ self,
35
+ vocab_file,
36
+ unk_token="<unk>",
37
+ bos_token="<|startoftext|>",
38
+ eos_token="<|endoftext|>",
39
+ pad_token="<unk>",
40
+ sp_model_kwargs: Optional[Dict[str, Any]] = None,
41
+ add_bos_token=True,
42
+ add_eos_token=False,
43
+ clean_up_tokenization_spaces=False,
44
+ **kwargs,
45
+ ):
46
+ self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs
47
+ bos_token = (
48
+ AddedToken(bos_token, lstrip=False, rstrip=False)
49
+ if isinstance(bos_token, str)
50
+ else bos_token
51
+ )
52
+ eos_token = (
53
+ AddedToken(eos_token, lstrip=False, rstrip=False)
54
+ if isinstance(eos_token, str)
55
+ else eos_token
56
+ )
57
+ unk_token = (
58
+ AddedToken(unk_token, lstrip=False, rstrip=False)
59
+ if isinstance(unk_token, str)
60
+ else unk_token
61
+ )
62
+ pad_token = (
63
+ AddedToken(pad_token, lstrip=False, rstrip=False)
64
+ if isinstance(pad_token, str)
65
+ else pad_token
66
+ )
67
+ self.vocab_file = vocab_file
68
+ self.add_bos_token = add_bos_token
69
+ self.add_eos_token = add_eos_token
70
+ self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
71
+ self.sp_model.Load(vocab_file)
72
+ super().__init__(
73
+ bos_token=bos_token,
74
+ eos_token=eos_token,
75
+ unk_token=unk_token,
76
+ pad_token=pad_token,
77
+ add_bos_token=add_bos_token,
78
+ add_eos_token=add_eos_token,
79
+ sp_model_kwargs=self.sp_model_kwargs,
80
+ clean_up_tokenization_spaces=clean_up_tokenization_spaces,
81
+ **kwargs,
82
+ )
83
+
84
+ def __getstate__(self):
85
+ state = self.__dict__.copy()
86
+ state["sp_model"] = None
87
+ return state
88
+
89
+ def __setstate__(self, d):
90
+ self.__dict__ = d
91
+ self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
92
+ self.sp_model.Load(self.vocab_file)
93
+
94
+ @property
95
+ def vocab_size(self):
96
+ """Returns vocab size"""
97
+ return self.sp_model.get_piece_size()
98
+
99
+ def get_vocab(self):
100
+ """Returns vocab as a dict"""
101
+ vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
102
+ vocab.update(self.added_tokens_encoder)
103
+ return vocab
104
+
105
+ def _tokenize(self, text):
106
+ """Returns a tokenized string."""
107
+ return self.sp_model.encode(text, out_type=str)
108
+
109
+ def _convert_token_to_id(self, token):
110
+ """Converts a token (str) in an id using the vocab."""
111
+ return self.sp_model.piece_to_id(token)
112
+
113
+ def _convert_id_to_token(self, index):
114
+ """Converts an index (integer) in a token (str) using the vocab."""
115
+ token = self.sp_model.IdToPiece(index)
116
+ return token
117
+
118
+ def convert_tokens_to_string(self, tokens):
119
+ """Converts a sequence of tokens (string) in a single string."""
120
+ current_sub_tokens = []
121
+ out_string = ""
122
+ prev_is_special = False
123
+ for i, token in enumerate(tokens):
124
+ # make sure that special tokens are not decoded using sentencepiece model
125
+ if token in self.all_special_tokens:
126
+ if not prev_is_special and i != 0:
127
+ out_string += " "
128
+ out_string += self.sp_model.decode(current_sub_tokens) + token
129
+ prev_is_special = True
130
+ current_sub_tokens = []
131
+ else:
132
+ current_sub_tokens.append(token)
133
+ prev_is_special = False
134
+ out_string += self.sp_model.decode(current_sub_tokens)
135
+ return out_string
136
+
137
+ def save_vocabulary(
138
+ self, save_directory, filename_prefix: Optional[str] = None
139
+ ) -> Tuple[str]:
140
+ """
141
+ Save the vocabulary and special tokens file to a directory.
142
+ Args:
143
+ save_directory (`str`):
144
+ The directory in which to save the vocabulary.
145
+ Returns:
146
+ `Tuple(str)`: Paths to the files saved.
147
+ """
148
+ if not os.path.isdir(save_directory):
149
+ logger.error(f"Vocabulary path ({save_directory}) should be a directory")
150
+ return
151
+ out_vocab_file = os.path.join(
152
+ save_directory,
153
+ (filename_prefix + "-" if filename_prefix else "")
154
+ + VOCAB_FILES_NAMES["vocab_file"],
155
+ )
156
+
157
+ if os.path.abspath(self.vocab_file) != os.path.abspath(
158
+ out_vocab_file
159
+ ) and os.path.isfile(self.vocab_file):
160
+ copyfile(self.vocab_file, out_vocab_file)
161
+ elif not os.path.isfile(self.vocab_file):
162
+ with open(out_vocab_file, "wb") as fi:
163
+ content_spiece_model = self.sp_model.serialized_model_proto()
164
+ fi.write(content_spiece_model)
165
+
166
+ return (out_vocab_file,)
167
+
168
+ def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
169
+ bos_token_id = [self.bos_token_id] if self.add_bos_token else []
170
+ eos_token_id = [self.eos_token_id] if self.add_eos_token else []
171
+
172
+ output = bos_token_id + token_ids_0 + eos_token_id
173
+
174
+ if token_ids_1 is not None:
175
+ output = output + bos_token_id + token_ids_1 + eos_token_id
176
+
177
+ return output
178
+
179
+ def get_special_tokens_mask(
180
+ self,
181
+ token_ids_0: List[int],
182
+ token_ids_1: Optional[List[int]] = None,
183
+ already_has_special_tokens: bool = False,
184
+ ) -> List[int]:
185
+ """
186
+ Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
187
+ special tokens using the tokenizer `prepare_for_model` method.
188
+ Args:
189
+ token_ids_0 (`List[int]`):
190
+ List of IDs.
191
+ token_ids_1 (`List[int]`, *optional*):
192
+ Optional second list of IDs for sequence pairs.
193
+ already_has_special_tokens (`bool`, *optional*, defaults to `False`):
194
+ Whether or not the token list is already formatted with special tokens for the model.
195
+ Returns:
196
+ `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
197
+ """
198
+ if already_has_special_tokens:
199
+ return super().get_special_tokens_mask(
200
+ token_ids_0=token_ids_0,
201
+ token_ids_1=token_ids_1,
202
+ already_has_special_tokens=True,
203
+ )
204
+
205
+ bos_token_id = [1] if self.add_bos_token else []
206
+ eos_token_id = [1] if self.add_eos_token else []
207
+
208
+ if token_ids_1 is None:
209
+ return bos_token_id + ([0] * len(token_ids_0)) + eos_token_id
210
+ return (
211
+ bos_token_id
212
+ + ([0] * len(token_ids_0))
213
+ + eos_token_id
214
+ + bos_token_id
215
+ + ([0] * len(token_ids_1))
216
+ + eos_token_id
217
+ )
218
+
219
+ def create_token_type_ids_from_sequences(
220
+ self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
221
+ ) -> List[int]:
222
+ """
223
+ Creates a mask from the two sequences passed to be used in a sequence-pair classification task. An ALBERT
224
+ sequence pair mask has the following format:
225
+ ```
226
+ 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
227
+ | first sequence | second sequence |
228
+ ```
229
+ if token_ids_1 is None, only returns the first portion of the mask (0s).
230
+ Args:
231
+ token_ids_0 (`List[int]`):
232
+ List of ids.
233
+ token_ids_1 (`List[int]`, *optional*):
234
+ Optional second list of IDs for sequence pairs.
235
+ Returns:
236
+ `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
237
+ """
238
+ bos_token_id = [self.bos_token_id] if self.add_bos_token else []
239
+ eos_token_id = [self.eos_token_id] if self.add_eos_token else []
240
+
241
+ output = [0] * len(bos_token_id + token_ids_0 + eos_token_id)
242
+
243
+ if token_ids_1 is not None:
244
+ output += [1] * len(bos_token_id + token_ids_1 + eos_token_id)
245
+
246
+ return output
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:386c49cf943d71aa110361135338c50e38beeff0a66593480421f37b319e1a39
3
+ size 1033105
tokenizer_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoTokenizer": ["tokenization_yi.YiTokenizer", null]
4
+ },
5
+ "add_bos_token": false,
6
+ "add_eos_token": false,
7
+ "model_max_length": 4096,
8
+ "tokenizer_class": "YiTokenizer"
9
+ }