bartowski commited on
Commit
c17e07c
1 Parent(s): 839781d

Quant for 3.0

Browse files
README.md CHANGED
@@ -119,73 +119,344 @@ extra_gated_description: >-
119
  Mistral AI processes your personal data below to provide the model and enforce its license. If you are affiliated with a commercial entity, we may also send you communications about our models. For more information on your rights and data handling, please see our <a href="https://mistral.ai/terms/">privacy policy</a>.
120
  extra_gated_button_content: Submit
121
  library_name: vllm
122
- quantized_by: bartowski
123
- pipeline_tag: text-generation
124
  ---
125
 
126
- ## Exllama v2 Quantizations of Mistral-Large-Instruct-2411
127
 
128
- Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.2.4">turboderp's ExLlamaV2 v0.2.4</a> for quantization.
129
 
130
- <b>The "main" branch only contains the measurement.json, download one of the other branches for the model (see below)</b>
 
 
 
 
 
 
 
 
131
 
132
- Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.
 
 
 
133
 
134
- Conversion was done using the default calibration dataset.
135
 
136
- Default arguments used except when the bits per weight is above 6.0, at that point the lm_head layer is quantized at 8 bits per weight instead of the default 6.
137
-
138
- Original model: https://huggingface.co/mistralai/Mistral-Large-Instruct-2411
139
 
 
140
 
141
- <a href="https://huggingface.co/bartowski/Mistral-Large-Instruct-2411-exl2/tree/6_5">6.5 bits per weight</a>
142
 
143
- <a href="https://huggingface.co/bartowski/Mistral-Large-Instruct-2411-exl2/tree/5_0">5.0 bits per weight</a>
144
 
145
- <a href="https://huggingface.co/bartowski/Mistral-Large-Instruct-2411-exl2/tree/4_25">4.25 bits per weight</a>
146
 
147
- <a href="https://huggingface.co/bartowski/Mistral-Large-Instruct-2411-exl2/tree/3_75">3.75 bits per weight</a>
148
 
149
- <a href="https://huggingface.co/bartowski/Mistral-Large-Instruct-2411-exl2/tree/3_5">3.5 bits per weight</a>
150
 
151
- <a href="https://huggingface.co/bartowski/Mistral-Large-Instruct-2411-exl2/tree/3_0">3.0 bits per weight</a>
 
152
 
153
- <a href="https://huggingface.co/bartowski/Mistral-Large-Instruct-2411-exl2/tree/2_2">2.2 bits per weight</a>
154
 
 
155
 
156
- ## Download instructions
 
 
157
 
158
- With git:
159
 
160
- ```shell
161
- git clone --single-branch --branch 6_5 https://huggingface.co/bartowski/Mistral-Large-Instruct-2411-exl2
162
  ```
163
 
164
- With huggingface hub (credit to TheBloke for instructions):
165
 
166
- ```shell
167
- pip3 install huggingface-hub
168
- ```
 
 
169
 
170
- To download the `main` (only useful if you only care about measurement.json) branch to a folder called `Mistral-Large-Instruct-2411-exl2`:
171
 
172
- ```shell
173
- mkdir Mistral-Large-Instruct-2411-exl2
174
- huggingface-cli download bartowski/Mistral-Large-Instruct-2411-exl2 --local-dir Mistral-Large-Instruct-2411-exl2
175
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
176
 
177
- To download from a different branch, add the `--revision` parameter:
178
 
179
- Linux:
180
 
181
- ```shell
182
- mkdir Mistral-Large-Instruct-2411-exl2-6_5
183
- huggingface-cli download bartowski/Mistral-Large-Instruct-2411-exl2 --revision 6_5 --local-dir Mistral-Large-Instruct-2411-exl2-6_5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
184
  ```
185
 
186
- Windows (which apparently doesn't like _ in folders sometimes?):
 
 
 
 
 
187
 
188
- ```shell
189
- mkdir Mistral-Large-Instruct-2411-exl2-6.5
190
- huggingface-cli download bartowski/Mistral-Large-Instruct-2411-exl2 --revision 6_5 --local-dir Mistral-Large-Instruct-2411-exl2-6.5
191
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
119
  Mistral AI processes your personal data below to provide the model and enforce its license. If you are affiliated with a commercial entity, we may also send you communications about our models. For more information on your rights and data handling, please see our <a href="https://mistral.ai/terms/">privacy policy</a>.
120
  extra_gated_button_content: Submit
121
  library_name: vllm
 
 
122
  ---
123
 
124
+ # Model Card for Mistral-Large-Instruct-2411
125
 
126
+ Mistral-Large-Instruct-2411 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities extending [Mistral-Large-Instruct-2407](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407) with better Long Context, Function Calling and System Prompt.
127
 
128
+ ## Key features
129
+ - **Multi-lingual by design:** Dozens of languages supported, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch and Polish.
130
+ - **Proficient in coding:** Trained on 80+ coding languages such as Python, Java, C, C++, Javacsript, and Bash. Also trained on more specific languages such as Swift and Fortran.
131
+ - **Agent-centric:** Best-in-class agentic capabilities with native function calling and JSON outputting.
132
+ - **Advanced Reasoning:** State-of-the-art mathematical and reasoning capabilities.
133
+ - **Mistral Research License:** Allows usage and modification for non-commercial usages.
134
+ - **Large Context:** A large 128k context window.
135
+ - **Robust Context Adherence:** Ensures strong adherence for RAG and large context applications.
136
+ - **System Prompt:** Maintains strong adherence and support for more reliable system prompts.
137
 
138
+ ### System Prompt
139
+ We appreciate the feedback received from our community regarding our system prompt handling.
140
+ In response, we have implemented stronger support for system prompts.
141
+ To achieve optimal results, we recommend always including a system prompt that clearly outlines the bot's purpose, even if it is minimal.
142
 
143
+ ### Basic Instruct Template (V7)
144
 
145
+ ```
146
+ <s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT][INST] <user message>[/INST] <assistant response></s>[INST] <user message>[/INST]
147
+ ```
148
 
149
+ **Be careful with subtle missing or trailing white spaces!**
150
 
151
+ *Please make sure to use [mistral-common](https://github.com/mistralai/mistral-common) as the source of truth*
152
 
153
+ ## Usage
154
 
155
+ The model can be used with the following frameworks
156
 
157
+ - [`vllm`](https://github.com/vllm-project/vllm): See [here](#vLLM)
158
 
159
+ ### vLLM
160
 
161
+ We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
162
+ to implement production-ready inference pipelines.
163
 
164
+ **_Installation_**
165
 
166
+ Make sure you install [`vLLM >= v0.6.4.post1`](https://github.com/vllm-project/vllm/releases/tag/v0.6.4.post1):
167
 
168
+ ```
169
+ pip install --upgrade vllm
170
+ ```
171
 
172
+ Also make sure you have [`mistral_common >= 1.5.0`](https://github.com/mistralai/mistral-common/releases/tag/v1.5.0) installed:
173
 
174
+ ```
175
+ pip install --upgrade mistral_common
176
  ```
177
 
178
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-55a88146a4da0b6e193431b5b1d3492dfd7bebdc16919df4d031273e85a6157c?context=explore).
179
 
180
+ ### Server
181
+
182
+ We recommand that you use Mistral-Large-Instruct-2411 in a server/client setting.
183
+
184
+ 1. Spin up a server:
185
 
 
186
 
 
 
 
187
  ```
188
+ vllm serve mistralai/Mistral-Large-Instruct-2411 --tokenizer_mode mistral --config_format mistral --load_format mistral --tensor_parallel_size 8
189
+ ```
190
+
191
+ **Note:** Running Mistral-Large-Instruct-2411 on GPU requires over 300 GB of GPU RAM.
192
+
193
+
194
+ 2. To ping the client you can use a simple Python snippet.
195
+
196
+ ```py
197
+ import requests
198
+ import json
199
+ from huggingface_hub import hf_hub_download
200
+ from datetime import datetime, timedelta
201
+
202
+ url = "http://<your-server>:8000/v1/chat/completions"
203
+ headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
204
+
205
+ model = "mistralai/Mistral-Large-Instruct-2411"
206
+
207
+
208
+ def load_system_prompt(repo_id: str, filename: str) -> str:
209
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
210
+ with open(file_path, "r") as file:
211
+ system_prompt = file.read()
212
+ today = datetime.today().strftime("%Y-%m-%d")
213
+ yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
214
+ model_name = repo_id.split("/")[-1]
215
+ return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
216
 
 
217
 
218
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
219
 
220
+
221
+ messages = [
222
+ {"role": "system", "content": SYSTEM_PROMPT + "\n\nThink step by step. You're a math genius."},
223
+ {
224
+ "role": "user",
225
+ "content": "Think of four random numbers. Then add, substract or multiply them so that the solution is 10. If it's not possible, say it."
226
+ },
227
+ ]
228
+
229
+ data = {"model": model, "messages": messages}
230
+
231
+ response = requests.post(url, headers=headers, data=json.dumps(data))
232
+ print(response.json()["choices"][0]["message"]["content"])
233
+ # Sure, let's start by thinking of four random numbers. For example, let's take 3, 5, 2, and 1.
234
+ #
235
+ # Now, we need to find a combination of addition, subtraction, or multiplication that results in 10.
236
+
237
+ # Let's try:
238
+
239
+ # \[ 3 + 5 + 2 - 1 = 9 \]
240
+
241
+ # This doesn't work. Let's try another combination:
242
+
243
+ # \[ 3 \times 2 + 5 - 1 = 6 + 5 - 1 = 10 \]
244
+
245
+ # This works! So, with the numbers 3, 5, 2, and 1, we can achieve the result 10 by performing the operations \( 3 \times 2 + 5 - 1 \).
246
+ ```
247
+
248
+ ### Offline
249
+
250
+ ```py
251
+ from vllm import LLM
252
+ from vllm.sampling_params import SamplingParams
253
+ from huggingface_hub import hf_hub_download
254
+ from datetime import datetime, timedelta
255
+
256
+ model_name = "mistralai/Mistral-Large-Instruct-2411"
257
+
258
+ def load_system_prompt(repo_id: str, filename: str) -> str:
259
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
260
+ with open(file_path, 'r') as file:
261
+ system_prompt = file.read()
262
+ today = datetime.today().strftime('%Y-%m-%d')
263
+ yesterday = (datetime.today() - timedelta(days=1)).strftime('%Y-%m-%d')
264
+ model_name = repo_id.split("/")[-1]
265
+ return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
266
+
267
+
268
+ SYSTEM_PROMPT = load_system_prompt(model_name, "SYSTEM_PROMPT.txt") + "\n\nThink step by step. You're a math genius."
269
+
270
+ user_prompt = "Without browsing the web, how many days ago was Mistral founded?"
271
+
272
+ messages = [
273
+ {
274
+ "role": "system",
275
+ "content": SYSTEM_PROMPT
276
+ },
277
+ {
278
+ "role": "user",
279
+ "content": user_prompt
280
+ },
281
+ ]
282
+
283
+ # note that running this model on GPU requires over 300 GB of GPU RAM
284
+ llm = LLM(model=model_name, tokenizer_mode="mistral", tensor_parallel_size=8)
285
+
286
+ sampling_params = SamplingParams(max_tokens=512)
287
+
288
+ outputs = llm.chat(messages, sampling_params=sampling_params)
289
+
290
+ print(outputs[0].outputs[0].text)
291
+ # I don't have real-time web browsing capabilities or access to current data, but I can help you calculate the number of days based on the information I have.
292
+ #
293
+ #Mistral AI was founded in April 2023. To determine how many days ago that was from today's date, November 18, 2024, we need to calculate the total number of days between April 2023 and November 2024.
294
+ #
295
+ #Here's the step-by-step calculation:
296
+ #
297
+ #1. **Days from April 2023 to December 2023:**
298
+ # - April 2023: 30 days (April has 30 days)
299
+ # - May 2023: 31 days
300
+ # - June 2023: 30 days
301
+ # - July 2023: 31 days
302
+ # - August 2023: 31 days
303
+ # - September 2023: 30 days
304
+ # - October 2023: 31 days
305
+ # - November 2023: 30 days
306
+ # - December 2023: 31 days
307
+ #
308
+ # Total days in 2023 from April to December = 30 + 31 + 30 + 31 + 31 + 30 + 31 + 30 + 31 = 275 days
309
+ #
310
+ #2. **Days from January 2024 to November 18, 2024:**
311
+ # - January 2024: 31 days
312
+ # - February 2024: 29 days (2024 is a leap year)
313
+ # - March 2024: 31 days
314
+ # - April 2024: 30 days
315
+ # - May 2024: 31 days
316
+ # - June 2024: 30 days
317
+ # - July 2024: 31 days
318
+ # - August 2024: 31 days
319
+ # - September 2024: 30 days
320
+ # - October 2024: 31 days
321
+ # - November 2024 (up to the 18th): 18 days
322
+ #
323
+ # Total days in 2024 from January to November 18 = 31 + 29 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31 + 18 = 323 days
324
+ #
325
+ #3. **Total days from April 2023 to November 18, 2024:**
326
+ # Total days = 275 days (2023) + 323 days (2024) = 598 days
327
+ #
328
+ #Therefore, Mistral AI was founded 598 days ago from today's date, November 18, 2024.
329
  ```
330
 
331
+ ### Improved Function Calling
332
+
333
+ Mistral-Large-2411 has much improved function calling capabilities that are fully supported
334
+ using [`mistral_common >= 1.5.0`](https://github.com/mistralai/mistral-common/releases/tag/v1.5.0) and [`vLLM >= v0.6.4.post1`](https://github.com/vllm-project/vllm/releases/tag/v0.6.4.post1).
335
+
336
+ Make sure to serve the model with the following flags in vLLM:
337
 
 
 
 
338
  ```
339
+ vllm serve mistralai/Pixtral-Large-Instruct-2411 --tokenizer_mode mistral --tensor-parallel-size 8 --tool-call-parser mistral --enable-auto-tool-choice
340
+ ```
341
+
342
+ <details>
343
+ <summary>Example</summary>
344
+
345
+ ```py
346
+ import requests
347
+ import json
348
+ from huggingface_hub import hf_hub_download
349
+ from datetime import datetime, timedelta
350
+
351
+ url = "http://<your-server>:8000/v1/chat/completions"
352
+ headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
353
+
354
+ model = "mistralai/Mistral-Large-Instruct-2411"
355
+
356
+
357
+ def load_system_prompt(repo_id: str, filename: str) -> str:
358
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
359
+ with open(file_path, "r") as file:
360
+ system_prompt = file.read()
361
+ today = datetime.today().strftime("%Y-%m-%d")
362
+ yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
363
+ model_name = repo_id.split("/")[-1]
364
+ return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
365
+
366
+
367
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
368
+
369
+
370
+ tools = [
371
+ {
372
+ "type": "function",
373
+ "function": {
374
+ "name": "get_current_weather",
375
+ "description": "Get the current weather in a given location",
376
+ "parameters": {
377
+ "type": "object",
378
+ "properties": {
379
+ "city": {
380
+ "type": "string",
381
+ "description": "The city to find the weather for, e.g. 'San Francisco'",
382
+ },
383
+ "state": {
384
+ "type": "string",
385
+ "description": "The state abbreviation, e.g. 'CA' for California",
386
+ },
387
+ "unit": {
388
+ "type": "string",
389
+ "description": "The unit for temperature",
390
+ "enum": ["celsius", "fahrenheit"],
391
+ },
392
+ },
393
+ "required": ["city", "state", "unit"],
394
+ },
395
+ },
396
+ },
397
+ {
398
+ "type": "function",
399
+ "function": {
400
+ "name": "rewrite",
401
+ "description": "Rewrite a given text for improved clarity",
402
+ "parameters": {
403
+ "type": "object",
404
+ "properties": {
405
+ "text": {
406
+ "type": "string",
407
+ "description": "The input text to rewrite",
408
+ }
409
+ },
410
+ },
411
+ },
412
+ },
413
+ ]
414
+
415
+ messages = [
416
+ {"role": "system", "content": SYSTEM_PROMPT},
417
+ {
418
+ "role": "user",
419
+ "content": "Could you please make the below article more concise?\n\nOpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership.",
420
+ },
421
+ {
422
+ "role": "assistant",
423
+ "content": "",
424
+ "tool_calls": [
425
+ {
426
+ "id": "bbc5b7ede",
427
+ "type": "function",
428
+ "function": {
429
+ "name": "rewrite",
430
+ "arguments": '{"text": "OpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership."}',
431
+ },
432
+ }
433
+ ],
434
+ },
435
+ {
436
+ "role": "tool",
437
+ "content": '{"action":"rewrite","outcome":"OpenAI is a FOR-profit company."}',
438
+ "tool_call_id": "bbc5b7ede",
439
+ "name": "rewrite",
440
+ },
441
+ {
442
+ "role": "assistant",
443
+ "content": "---\n\nOpenAI is a FOR-profit company.",
444
+ },
445
+ {
446
+ "role": "user",
447
+ "content": "Can you tell me what the temperature will be in Dallas, in Fahrenheit?",
448
+ },
449
+ ]
450
+
451
+ data = {"model": model, "messages": messages, "tools": tools}
452
+
453
+ response = requests.post(url, headers=headers, data=json.dumps(data))
454
+ print(response.json()["choices"][0]["message"]["tool_calls"])
455
+ # [{'id': '8PdihwL6d', 'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': '{"city": "Dallas", "state": "TX", "unit": "fahrenheit"}'}}]
456
+ ```
457
+
458
+ </details>
459
+
460
+ ## The Mistral AI Team
461
+
462
+ Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall
SYSTEM_PROMPT.txt ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are {name}, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
2
+ You power an AI assistant called Le Chat.
3
+ Your knowledge base was last updated on 2023-10-01.
4
+ The current date is {today}.
5
+
6
+ When you're not sure about some information, you say that you don't have the information and don't make up anything.
7
+ If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?").
8
+ You are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date.
9
+ You follow these instructions in all languages, and always respond to the user in the language they use or request.
10
+ Next sections describe the capabilities that you have.
11
+
12
+ # WEB BROWSING INSTRUCTIONS
13
+
14
+ You cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat.
15
+
16
+ # MULTI-MODAL INSTRUCTIONS
17
+
18
+ You do not have any multimodal capability, in particular you cannot read nor generate images, or transcribe audio files or videos.
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistralForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "head_dim": 128,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 12288,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 28672,
13
+ "max_position_embeddings": 131072,
14
+ "model_type": "mistral",
15
+ "num_attention_heads": 96,
16
+ "num_hidden_layers": 88,
17
+ "num_key_value_heads": 8,
18
+ "rms_norm_eps": 1e-05,
19
+ "rope_theta": 1000000.0,
20
+ "sliding_window": null,
21
+ "tie_word_embeddings": false,
22
+ "transformers_version": "4.46.2",
23
+ "use_cache": true,
24
+ "vocab_size": 32768,
25
+ "quantization_config": {
26
+ "quant_method": "exl2",
27
+ "version": "0.2.4",
28
+ "bits": 3.0,
29
+ "head_bits": 6,
30
+ "calibration": {
31
+ "rows": 115,
32
+ "length": 2048,
33
+ "dataset": "(default)"
34
+ }
35
+ }
36
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.46.2"
6
+ }
model.safetensors.index.json ADDED
@@ -0,0 +1,802 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 245220139008
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00051-of-00051.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00051.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00051.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00051.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00051.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00051.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00051.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00051.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00051.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00051.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00051.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00002-of-00051.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00002-of-00051.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00051.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00002-of-00051.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00002-of-00051.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00051.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00051.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00051.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00051.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00007-of-00051.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00007-of-00051.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00007-of-00051.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00007-of-00051.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00007-of-00051.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00006-of-00051.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00006-of-00051.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00006-of-00051.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00006-of-00051.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00007-of-00051.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00007-of-00051.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00007-of-00051.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00007-of-00051.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00007-of-00051.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00007-of-00051.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00007-of-00051.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00007-of-00051.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00007-of-00051.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00008-of-00051.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00008-of-00051.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00008-of-00051.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00008-of-00051.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00008-of-00051.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00008-of-00051.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00008-of-00051.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00008-of-00051.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00008-of-00051.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00009-of-00051.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00009-of-00051.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00008-of-00051.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00008-of-00051.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00009-of-00051.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00008-of-00051.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00008-of-00051.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00008-of-00051.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00008-of-00051.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00009-of-00051.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00009-of-00051.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00009-of-00051.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00009-of-00051.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00009-of-00051.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00009-of-00051.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00009-of-00051.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00009-of-00051.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00009-of-00051.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00010-of-00051.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00010-of-00051.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00009-of-00051.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00010-of-00051.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00010-of-00051.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00009-of-00051.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00009-of-00051.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00009-of-00051.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00009-of-00051.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00010-of-00051.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00010-of-00051.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00010-of-00051.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00010-of-00051.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00010-of-00051.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00010-of-00051.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00010-of-00051.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00010-of-00051.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00010-of-00051.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00011-of-00051.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00011-of-00051.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00011-of-00051.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00011-of-00051.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00011-of-00051.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00010-of-00051.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00010-of-00051.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00010-of-00051.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00010-of-00051.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00011-of-00051.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00011-of-00051.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00011-of-00051.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00011-of-00051.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00011-of-00051.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00011-of-00051.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00011-of-00051.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00011-of-00051.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00011-of-00051.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00012-of-00051.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00012-of-00051.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00012-of-00051.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00012-of-00051.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00012-of-00051.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00012-of-00051.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00012-of-00051.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00012-of-00051.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00012-of-00051.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00002-of-00051.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00002-of-00051.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00002-of-00051.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00002-of-00051.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00002-of-00051.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00002-of-00051.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00002-of-00051.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00002-of-00051.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00002-of-00051.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00013-of-00051.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00013-of-00051.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00012-of-00051.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00012-of-00051.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00013-of-00051.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00012-of-00051.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00012-of-00051.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00012-of-00051.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00012-of-00051.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00013-of-00051.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00013-of-00051.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00013-of-00051.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00013-of-00051.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00013-of-00051.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00013-of-00051.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00013-of-00051.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00013-of-00051.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00013-of-00051.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00014-of-00051.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00014-of-00051.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00013-of-00051.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00014-of-00051.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00014-of-00051.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00013-of-00051.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00013-of-00051.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00013-of-00051.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00013-of-00051.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00014-of-00051.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00014-of-00051.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00014-of-00051.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00014-of-00051.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00014-of-00051.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00014-of-00051.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00014-of-00051.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00014-of-00051.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00014-of-00051.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00015-of-00051.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00015-of-00051.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00015-of-00051.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00015-of-00051.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00015-of-00051.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00014-of-00051.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00014-of-00051.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00014-of-00051.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00014-of-00051.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00015-of-00051.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00015-of-00051.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00015-of-00051.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00015-of-00051.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00015-of-00051.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00015-of-00051.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00015-of-00051.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00015-of-00051.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00015-of-00051.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00016-of-00051.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00016-of-00051.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00016-of-00051.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00016-of-00051.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00016-of-00051.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00016-of-00051.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00016-of-00051.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00016-of-00051.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00016-of-00051.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00017-of-00051.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00017-of-00051.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00016-of-00051.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00016-of-00051.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00017-of-00051.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00016-of-00051.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00016-of-00051.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00016-of-00051.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00016-of-00051.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00017-of-00051.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00017-of-00051.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00017-of-00051.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00017-of-00051.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00017-of-00051.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00017-of-00051.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00017-of-00051.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00017-of-00051.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00017-of-00051.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00018-of-00051.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00018-of-00051.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00017-of-00051.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00018-of-00051.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00018-of-00051.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00017-of-00051.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00017-of-00051.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00017-of-00051.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00017-of-00051.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00003-of-00051.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00003-of-00051.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00003-of-00051.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00003-of-00051.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00003-of-00051.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00051.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00051.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00051.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00051.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00018-of-00051.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00018-of-00051.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00018-of-00051.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00018-of-00051.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00018-of-00051.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00018-of-00051.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00018-of-00051.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00018-of-00051.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00018-of-00051.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00019-of-00051.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00019-of-00051.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00019-of-00051.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00019-of-00051.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00019-of-00051.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00018-of-00051.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00018-of-00051.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00018-of-00051.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00018-of-00051.safetensors",
242
+ "model.layers.32.input_layernorm.weight": "model-00019-of-00051.safetensors",
243
+ "model.layers.32.mlp.down_proj.weight": "model-00019-of-00051.safetensors",
244
+ "model.layers.32.mlp.gate_proj.weight": "model-00019-of-00051.safetensors",
245
+ "model.layers.32.mlp.up_proj.weight": "model-00019-of-00051.safetensors",
246
+ "model.layers.32.post_attention_layernorm.weight": "model-00019-of-00051.safetensors",
247
+ "model.layers.32.self_attn.k_proj.weight": "model-00019-of-00051.safetensors",
248
+ "model.layers.32.self_attn.o_proj.weight": "model-00019-of-00051.safetensors",
249
+ "model.layers.32.self_attn.q_proj.weight": "model-00019-of-00051.safetensors",
250
+ "model.layers.32.self_attn.v_proj.weight": "model-00019-of-00051.safetensors",
251
+ "model.layers.33.input_layernorm.weight": "model-00020-of-00051.safetensors",
252
+ "model.layers.33.mlp.down_proj.weight": "model-00020-of-00051.safetensors",
253
+ "model.layers.33.mlp.gate_proj.weight": "model-00020-of-00051.safetensors",
254
+ "model.layers.33.mlp.up_proj.weight": "model-00020-of-00051.safetensors",
255
+ "model.layers.33.post_attention_layernorm.weight": "model-00020-of-00051.safetensors",
256
+ "model.layers.33.self_attn.k_proj.weight": "model-00020-of-00051.safetensors",
257
+ "model.layers.33.self_attn.o_proj.weight": "model-00020-of-00051.safetensors",
258
+ "model.layers.33.self_attn.q_proj.weight": "model-00020-of-00051.safetensors",
259
+ "model.layers.33.self_attn.v_proj.weight": "model-00020-of-00051.safetensors",
260
+ "model.layers.34.input_layernorm.weight": "model-00021-of-00051.safetensors",
261
+ "model.layers.34.mlp.down_proj.weight": "model-00021-of-00051.safetensors",
262
+ "model.layers.34.mlp.gate_proj.weight": "model-00020-of-00051.safetensors",
263
+ "model.layers.34.mlp.up_proj.weight": "model-00020-of-00051.safetensors",
264
+ "model.layers.34.post_attention_layernorm.weight": "model-00021-of-00051.safetensors",
265
+ "model.layers.34.self_attn.k_proj.weight": "model-00020-of-00051.safetensors",
266
+ "model.layers.34.self_attn.o_proj.weight": "model-00020-of-00051.safetensors",
267
+ "model.layers.34.self_attn.q_proj.weight": "model-00020-of-00051.safetensors",
268
+ "model.layers.34.self_attn.v_proj.weight": "model-00020-of-00051.safetensors",
269
+ "model.layers.35.input_layernorm.weight": "model-00021-of-00051.safetensors",
270
+ "model.layers.35.mlp.down_proj.weight": "model-00021-of-00051.safetensors",
271
+ "model.layers.35.mlp.gate_proj.weight": "model-00021-of-00051.safetensors",
272
+ "model.layers.35.mlp.up_proj.weight": "model-00021-of-00051.safetensors",
273
+ "model.layers.35.post_attention_layernorm.weight": "model-00021-of-00051.safetensors",
274
+ "model.layers.35.self_attn.k_proj.weight": "model-00021-of-00051.safetensors",
275
+ "model.layers.35.self_attn.o_proj.weight": "model-00021-of-00051.safetensors",
276
+ "model.layers.35.self_attn.q_proj.weight": "model-00021-of-00051.safetensors",
277
+ "model.layers.35.self_attn.v_proj.weight": "model-00021-of-00051.safetensors",
278
+ "model.layers.36.input_layernorm.weight": "model-00022-of-00051.safetensors",
279
+ "model.layers.36.mlp.down_proj.weight": "model-00022-of-00051.safetensors",
280
+ "model.layers.36.mlp.gate_proj.weight": "model-00021-of-00051.safetensors",
281
+ "model.layers.36.mlp.up_proj.weight": "model-00022-of-00051.safetensors",
282
+ "model.layers.36.post_attention_layernorm.weight": "model-00022-of-00051.safetensors",
283
+ "model.layers.36.self_attn.k_proj.weight": "model-00021-of-00051.safetensors",
284
+ "model.layers.36.self_attn.o_proj.weight": "model-00021-of-00051.safetensors",
285
+ "model.layers.36.self_attn.q_proj.weight": "model-00021-of-00051.safetensors",
286
+ "model.layers.36.self_attn.v_proj.weight": "model-00021-of-00051.safetensors",
287
+ "model.layers.37.input_layernorm.weight": "model-00022-of-00051.safetensors",
288
+ "model.layers.37.mlp.down_proj.weight": "model-00022-of-00051.safetensors",
289
+ "model.layers.37.mlp.gate_proj.weight": "model-00022-of-00051.safetensors",
290
+ "model.layers.37.mlp.up_proj.weight": "model-00022-of-00051.safetensors",
291
+ "model.layers.37.post_attention_layernorm.weight": "model-00022-of-00051.safetensors",
292
+ "model.layers.37.self_attn.k_proj.weight": "model-00022-of-00051.safetensors",
293
+ "model.layers.37.self_attn.o_proj.weight": "model-00022-of-00051.safetensors",
294
+ "model.layers.37.self_attn.q_proj.weight": "model-00022-of-00051.safetensors",
295
+ "model.layers.37.self_attn.v_proj.weight": "model-00022-of-00051.safetensors",
296
+ "model.layers.38.input_layernorm.weight": "model-00023-of-00051.safetensors",
297
+ "model.layers.38.mlp.down_proj.weight": "model-00023-of-00051.safetensors",
298
+ "model.layers.38.mlp.gate_proj.weight": "model-00023-of-00051.safetensors",
299
+ "model.layers.38.mlp.up_proj.weight": "model-00023-of-00051.safetensors",
300
+ "model.layers.38.post_attention_layernorm.weight": "model-00023-of-00051.safetensors",
301
+ "model.layers.38.self_attn.k_proj.weight": "model-00022-of-00051.safetensors",
302
+ "model.layers.38.self_attn.o_proj.weight": "model-00022-of-00051.safetensors",
303
+ "model.layers.38.self_attn.q_proj.weight": "model-00022-of-00051.safetensors",
304
+ "model.layers.38.self_attn.v_proj.weight": "model-00022-of-00051.safetensors",
305
+ "model.layers.39.input_layernorm.weight": "model-00023-of-00051.safetensors",
306
+ "model.layers.39.mlp.down_proj.weight": "model-00023-of-00051.safetensors",
307
+ "model.layers.39.mlp.gate_proj.weight": "model-00023-of-00051.safetensors",
308
+ "model.layers.39.mlp.up_proj.weight": "model-00023-of-00051.safetensors",
309
+ "model.layers.39.post_attention_layernorm.weight": "model-00023-of-00051.safetensors",
310
+ "model.layers.39.self_attn.k_proj.weight": "model-00023-of-00051.safetensors",
311
+ "model.layers.39.self_attn.o_proj.weight": "model-00023-of-00051.safetensors",
312
+ "model.layers.39.self_attn.q_proj.weight": "model-00023-of-00051.safetensors",
313
+ "model.layers.39.self_attn.v_proj.weight": "model-00023-of-00051.safetensors",
314
+ "model.layers.4.input_layernorm.weight": "model-00003-of-00051.safetensors",
315
+ "model.layers.4.mlp.down_proj.weight": "model-00003-of-00051.safetensors",
316
+ "model.layers.4.mlp.gate_proj.weight": "model-00003-of-00051.safetensors",
317
+ "model.layers.4.mlp.up_proj.weight": "model-00003-of-00051.safetensors",
318
+ "model.layers.4.post_attention_layernorm.weight": "model-00003-of-00051.safetensors",
319
+ "model.layers.4.self_attn.k_proj.weight": "model-00003-of-00051.safetensors",
320
+ "model.layers.4.self_attn.o_proj.weight": "model-00003-of-00051.safetensors",
321
+ "model.layers.4.self_attn.q_proj.weight": "model-00003-of-00051.safetensors",
322
+ "model.layers.4.self_attn.v_proj.weight": "model-00003-of-00051.safetensors",
323
+ "model.layers.40.input_layernorm.weight": "model-00024-of-00051.safetensors",
324
+ "model.layers.40.mlp.down_proj.weight": "model-00024-of-00051.safetensors",
325
+ "model.layers.40.mlp.gate_proj.weight": "model-00024-of-00051.safetensors",
326
+ "model.layers.40.mlp.up_proj.weight": "model-00024-of-00051.safetensors",
327
+ "model.layers.40.post_attention_layernorm.weight": "model-00024-of-00051.safetensors",
328
+ "model.layers.40.self_attn.k_proj.weight": "model-00024-of-00051.safetensors",
329
+ "model.layers.40.self_attn.o_proj.weight": "model-00024-of-00051.safetensors",
330
+ "model.layers.40.self_attn.q_proj.weight": "model-00024-of-00051.safetensors",
331
+ "model.layers.40.self_attn.v_proj.weight": "model-00024-of-00051.safetensors",
332
+ "model.layers.41.input_layernorm.weight": "model-00025-of-00051.safetensors",
333
+ "model.layers.41.mlp.down_proj.weight": "model-00025-of-00051.safetensors",
334
+ "model.layers.41.mlp.gate_proj.weight": "model-00024-of-00051.safetensors",
335
+ "model.layers.41.mlp.up_proj.weight": "model-00024-of-00051.safetensors",
336
+ "model.layers.41.post_attention_layernorm.weight": "model-00025-of-00051.safetensors",
337
+ "model.layers.41.self_attn.k_proj.weight": "model-00024-of-00051.safetensors",
338
+ "model.layers.41.self_attn.o_proj.weight": "model-00024-of-00051.safetensors",
339
+ "model.layers.41.self_attn.q_proj.weight": "model-00024-of-00051.safetensors",
340
+ "model.layers.41.self_attn.v_proj.weight": "model-00024-of-00051.safetensors",
341
+ "model.layers.42.input_layernorm.weight": "model-00025-of-00051.safetensors",
342
+ "model.layers.42.mlp.down_proj.weight": "model-00025-of-00051.safetensors",
343
+ "model.layers.42.mlp.gate_proj.weight": "model-00025-of-00051.safetensors",
344
+ "model.layers.42.mlp.up_proj.weight": "model-00025-of-00051.safetensors",
345
+ "model.layers.42.post_attention_layernorm.weight": "model-00025-of-00051.safetensors",
346
+ "model.layers.42.self_attn.k_proj.weight": "model-00025-of-00051.safetensors",
347
+ "model.layers.42.self_attn.o_proj.weight": "model-00025-of-00051.safetensors",
348
+ "model.layers.42.self_attn.q_proj.weight": "model-00025-of-00051.safetensors",
349
+ "model.layers.42.self_attn.v_proj.weight": "model-00025-of-00051.safetensors",
350
+ "model.layers.43.input_layernorm.weight": "model-00026-of-00051.safetensors",
351
+ "model.layers.43.mlp.down_proj.weight": "model-00026-of-00051.safetensors",
352
+ "model.layers.43.mlp.gate_proj.weight": "model-00025-of-00051.safetensors",
353
+ "model.layers.43.mlp.up_proj.weight": "model-00026-of-00051.safetensors",
354
+ "model.layers.43.post_attention_layernorm.weight": "model-00026-of-00051.safetensors",
355
+ "model.layers.43.self_attn.k_proj.weight": "model-00025-of-00051.safetensors",
356
+ "model.layers.43.self_attn.o_proj.weight": "model-00025-of-00051.safetensors",
357
+ "model.layers.43.self_attn.q_proj.weight": "model-00025-of-00051.safetensors",
358
+ "model.layers.43.self_attn.v_proj.weight": "model-00025-of-00051.safetensors",
359
+ "model.layers.44.input_layernorm.weight": "model-00026-of-00051.safetensors",
360
+ "model.layers.44.mlp.down_proj.weight": "model-00026-of-00051.safetensors",
361
+ "model.layers.44.mlp.gate_proj.weight": "model-00026-of-00051.safetensors",
362
+ "model.layers.44.mlp.up_proj.weight": "model-00026-of-00051.safetensors",
363
+ "model.layers.44.post_attention_layernorm.weight": "model-00026-of-00051.safetensors",
364
+ "model.layers.44.self_attn.k_proj.weight": "model-00026-of-00051.safetensors",
365
+ "model.layers.44.self_attn.o_proj.weight": "model-00026-of-00051.safetensors",
366
+ "model.layers.44.self_attn.q_proj.weight": "model-00026-of-00051.safetensors",
367
+ "model.layers.44.self_attn.v_proj.weight": "model-00026-of-00051.safetensors",
368
+ "model.layers.45.input_layernorm.weight": "model-00027-of-00051.safetensors",
369
+ "model.layers.45.mlp.down_proj.weight": "model-00027-of-00051.safetensors",
370
+ "model.layers.45.mlp.gate_proj.weight": "model-00027-of-00051.safetensors",
371
+ "model.layers.45.mlp.up_proj.weight": "model-00027-of-00051.safetensors",
372
+ "model.layers.45.post_attention_layernorm.weight": "model-00027-of-00051.safetensors",
373
+ "model.layers.45.self_attn.k_proj.weight": "model-00026-of-00051.safetensors",
374
+ "model.layers.45.self_attn.o_proj.weight": "model-00026-of-00051.safetensors",
375
+ "model.layers.45.self_attn.q_proj.weight": "model-00026-of-00051.safetensors",
376
+ "model.layers.45.self_attn.v_proj.weight": "model-00026-of-00051.safetensors",
377
+ "model.layers.46.input_layernorm.weight": "model-00027-of-00051.safetensors",
378
+ "model.layers.46.mlp.down_proj.weight": "model-00027-of-00051.safetensors",
379
+ "model.layers.46.mlp.gate_proj.weight": "model-00027-of-00051.safetensors",
380
+ "model.layers.46.mlp.up_proj.weight": "model-00027-of-00051.safetensors",
381
+ "model.layers.46.post_attention_layernorm.weight": "model-00027-of-00051.safetensors",
382
+ "model.layers.46.self_attn.k_proj.weight": "model-00027-of-00051.safetensors",
383
+ "model.layers.46.self_attn.o_proj.weight": "model-00027-of-00051.safetensors",
384
+ "model.layers.46.self_attn.q_proj.weight": "model-00027-of-00051.safetensors",
385
+ "model.layers.46.self_attn.v_proj.weight": "model-00027-of-00051.safetensors",
386
+ "model.layers.47.input_layernorm.weight": "model-00028-of-00051.safetensors",
387
+ "model.layers.47.mlp.down_proj.weight": "model-00028-of-00051.safetensors",
388
+ "model.layers.47.mlp.gate_proj.weight": "model-00028-of-00051.safetensors",
389
+ "model.layers.47.mlp.up_proj.weight": "model-00028-of-00051.safetensors",
390
+ "model.layers.47.post_attention_layernorm.weight": "model-00028-of-00051.safetensors",
391
+ "model.layers.47.self_attn.k_proj.weight": "model-00028-of-00051.safetensors",
392
+ "model.layers.47.self_attn.o_proj.weight": "model-00028-of-00051.safetensors",
393
+ "model.layers.47.self_attn.q_proj.weight": "model-00028-of-00051.safetensors",
394
+ "model.layers.47.self_attn.v_proj.weight": "model-00028-of-00051.safetensors",
395
+ "model.layers.48.input_layernorm.weight": "model-00029-of-00051.safetensors",
396
+ "model.layers.48.mlp.down_proj.weight": "model-00029-of-00051.safetensors",
397
+ "model.layers.48.mlp.gate_proj.weight": "model-00028-of-00051.safetensors",
398
+ "model.layers.48.mlp.up_proj.weight": "model-00028-of-00051.safetensors",
399
+ "model.layers.48.post_attention_layernorm.weight": "model-00029-of-00051.safetensors",
400
+ "model.layers.48.self_attn.k_proj.weight": "model-00028-of-00051.safetensors",
401
+ "model.layers.48.self_attn.o_proj.weight": "model-00028-of-00051.safetensors",
402
+ "model.layers.48.self_attn.q_proj.weight": "model-00028-of-00051.safetensors",
403
+ "model.layers.48.self_attn.v_proj.weight": "model-00028-of-00051.safetensors",
404
+ "model.layers.49.input_layernorm.weight": "model-00029-of-00051.safetensors",
405
+ "model.layers.49.mlp.down_proj.weight": "model-00029-of-00051.safetensors",
406
+ "model.layers.49.mlp.gate_proj.weight": "model-00029-of-00051.safetensors",
407
+ "model.layers.49.mlp.up_proj.weight": "model-00029-of-00051.safetensors",
408
+ "model.layers.49.post_attention_layernorm.weight": "model-00029-of-00051.safetensors",
409
+ "model.layers.49.self_attn.k_proj.weight": "model-00029-of-00051.safetensors",
410
+ "model.layers.49.self_attn.o_proj.weight": "model-00029-of-00051.safetensors",
411
+ "model.layers.49.self_attn.q_proj.weight": "model-00029-of-00051.safetensors",
412
+ "model.layers.49.self_attn.v_proj.weight": "model-00029-of-00051.safetensors",
413
+ "model.layers.5.input_layernorm.weight": "model-00004-of-00051.safetensors",
414
+ "model.layers.5.mlp.down_proj.weight": "model-00004-of-00051.safetensors",
415
+ "model.layers.5.mlp.gate_proj.weight": "model-00004-of-00051.safetensors",
416
+ "model.layers.5.mlp.up_proj.weight": "model-00004-of-00051.safetensors",
417
+ "model.layers.5.post_attention_layernorm.weight": "model-00004-of-00051.safetensors",
418
+ "model.layers.5.self_attn.k_proj.weight": "model-00004-of-00051.safetensors",
419
+ "model.layers.5.self_attn.o_proj.weight": "model-00004-of-00051.safetensors",
420
+ "model.layers.5.self_attn.q_proj.weight": "model-00004-of-00051.safetensors",
421
+ "model.layers.5.self_attn.v_proj.weight": "model-00004-of-00051.safetensors",
422
+ "model.layers.50.input_layernorm.weight": "model-00030-of-00051.safetensors",
423
+ "model.layers.50.mlp.down_proj.weight": "model-00030-of-00051.safetensors",
424
+ "model.layers.50.mlp.gate_proj.weight": "model-00029-of-00051.safetensors",
425
+ "model.layers.50.mlp.up_proj.weight": "model-00030-of-00051.safetensors",
426
+ "model.layers.50.post_attention_layernorm.weight": "model-00030-of-00051.safetensors",
427
+ "model.layers.50.self_attn.k_proj.weight": "model-00029-of-00051.safetensors",
428
+ "model.layers.50.self_attn.o_proj.weight": "model-00029-of-00051.safetensors",
429
+ "model.layers.50.self_attn.q_proj.weight": "model-00029-of-00051.safetensors",
430
+ "model.layers.50.self_attn.v_proj.weight": "model-00029-of-00051.safetensors",
431
+ "model.layers.51.input_layernorm.weight": "model-00030-of-00051.safetensors",
432
+ "model.layers.51.mlp.down_proj.weight": "model-00030-of-00051.safetensors",
433
+ "model.layers.51.mlp.gate_proj.weight": "model-00030-of-00051.safetensors",
434
+ "model.layers.51.mlp.up_proj.weight": "model-00030-of-00051.safetensors",
435
+ "model.layers.51.post_attention_layernorm.weight": "model-00030-of-00051.safetensors",
436
+ "model.layers.51.self_attn.k_proj.weight": "model-00030-of-00051.safetensors",
437
+ "model.layers.51.self_attn.o_proj.weight": "model-00030-of-00051.safetensors",
438
+ "model.layers.51.self_attn.q_proj.weight": "model-00030-of-00051.safetensors",
439
+ "model.layers.51.self_attn.v_proj.weight": "model-00030-of-00051.safetensors",
440
+ "model.layers.52.input_layernorm.weight": "model-00031-of-00051.safetensors",
441
+ "model.layers.52.mlp.down_proj.weight": "model-00031-of-00051.safetensors",
442
+ "model.layers.52.mlp.gate_proj.weight": "model-00031-of-00051.safetensors",
443
+ "model.layers.52.mlp.up_proj.weight": "model-00031-of-00051.safetensors",
444
+ "model.layers.52.post_attention_layernorm.weight": "model-00031-of-00051.safetensors",
445
+ "model.layers.52.self_attn.k_proj.weight": "model-00030-of-00051.safetensors",
446
+ "model.layers.52.self_attn.o_proj.weight": "model-00030-of-00051.safetensors",
447
+ "model.layers.52.self_attn.q_proj.weight": "model-00030-of-00051.safetensors",
448
+ "model.layers.52.self_attn.v_proj.weight": "model-00030-of-00051.safetensors",
449
+ "model.layers.53.input_layernorm.weight": "model-00031-of-00051.safetensors",
450
+ "model.layers.53.mlp.down_proj.weight": "model-00031-of-00051.safetensors",
451
+ "model.layers.53.mlp.gate_proj.weight": "model-00031-of-00051.safetensors",
452
+ "model.layers.53.mlp.up_proj.weight": "model-00031-of-00051.safetensors",
453
+ "model.layers.53.post_attention_layernorm.weight": "model-00031-of-00051.safetensors",
454
+ "model.layers.53.self_attn.k_proj.weight": "model-00031-of-00051.safetensors",
455
+ "model.layers.53.self_attn.o_proj.weight": "model-00031-of-00051.safetensors",
456
+ "model.layers.53.self_attn.q_proj.weight": "model-00031-of-00051.safetensors",
457
+ "model.layers.53.self_attn.v_proj.weight": "model-00031-of-00051.safetensors",
458
+ "model.layers.54.input_layernorm.weight": "model-00032-of-00051.safetensors",
459
+ "model.layers.54.mlp.down_proj.weight": "model-00032-of-00051.safetensors",
460
+ "model.layers.54.mlp.gate_proj.weight": "model-00032-of-00051.safetensors",
461
+ "model.layers.54.mlp.up_proj.weight": "model-00032-of-00051.safetensors",
462
+ "model.layers.54.post_attention_layernorm.weight": "model-00032-of-00051.safetensors",
463
+ "model.layers.54.self_attn.k_proj.weight": "model-00032-of-00051.safetensors",
464
+ "model.layers.54.self_attn.o_proj.weight": "model-00032-of-00051.safetensors",
465
+ "model.layers.54.self_attn.q_proj.weight": "model-00032-of-00051.safetensors",
466
+ "model.layers.54.self_attn.v_proj.weight": "model-00032-of-00051.safetensors",
467
+ "model.layers.55.input_layernorm.weight": "model-00033-of-00051.safetensors",
468
+ "model.layers.55.mlp.down_proj.weight": "model-00033-of-00051.safetensors",
469
+ "model.layers.55.mlp.gate_proj.weight": "model-00032-of-00051.safetensors",
470
+ "model.layers.55.mlp.up_proj.weight": "model-00032-of-00051.safetensors",
471
+ "model.layers.55.post_attention_layernorm.weight": "model-00033-of-00051.safetensors",
472
+ "model.layers.55.self_attn.k_proj.weight": "model-00032-of-00051.safetensors",
473
+ "model.layers.55.self_attn.o_proj.weight": "model-00032-of-00051.safetensors",
474
+ "model.layers.55.self_attn.q_proj.weight": "model-00032-of-00051.safetensors",
475
+ "model.layers.55.self_attn.v_proj.weight": "model-00032-of-00051.safetensors",
476
+ "model.layers.56.input_layernorm.weight": "model-00033-of-00051.safetensors",
477
+ "model.layers.56.mlp.down_proj.weight": "model-00033-of-00051.safetensors",
478
+ "model.layers.56.mlp.gate_proj.weight": "model-00033-of-00051.safetensors",
479
+ "model.layers.56.mlp.up_proj.weight": "model-00033-of-00051.safetensors",
480
+ "model.layers.56.post_attention_layernorm.weight": "model-00033-of-00051.safetensors",
481
+ "model.layers.56.self_attn.k_proj.weight": "model-00033-of-00051.safetensors",
482
+ "model.layers.56.self_attn.o_proj.weight": "model-00033-of-00051.safetensors",
483
+ "model.layers.56.self_attn.q_proj.weight": "model-00033-of-00051.safetensors",
484
+ "model.layers.56.self_attn.v_proj.weight": "model-00033-of-00051.safetensors",
485
+ "model.layers.57.input_layernorm.weight": "model-00034-of-00051.safetensors",
486
+ "model.layers.57.mlp.down_proj.weight": "model-00034-of-00051.safetensors",
487
+ "model.layers.57.mlp.gate_proj.weight": "model-00033-of-00051.safetensors",
488
+ "model.layers.57.mlp.up_proj.weight": "model-00034-of-00051.safetensors",
489
+ "model.layers.57.post_attention_layernorm.weight": "model-00034-of-00051.safetensors",
490
+ "model.layers.57.self_attn.k_proj.weight": "model-00033-of-00051.safetensors",
491
+ "model.layers.57.self_attn.o_proj.weight": "model-00033-of-00051.safetensors",
492
+ "model.layers.57.self_attn.q_proj.weight": "model-00033-of-00051.safetensors",
493
+ "model.layers.57.self_attn.v_proj.weight": "model-00033-of-00051.safetensors",
494
+ "model.layers.58.input_layernorm.weight": "model-00034-of-00051.safetensors",
495
+ "model.layers.58.mlp.down_proj.weight": "model-00034-of-00051.safetensors",
496
+ "model.layers.58.mlp.gate_proj.weight": "model-00034-of-00051.safetensors",
497
+ "model.layers.58.mlp.up_proj.weight": "model-00034-of-00051.safetensors",
498
+ "model.layers.58.post_attention_layernorm.weight": "model-00034-of-00051.safetensors",
499
+ "model.layers.58.self_attn.k_proj.weight": "model-00034-of-00051.safetensors",
500
+ "model.layers.58.self_attn.o_proj.weight": "model-00034-of-00051.safetensors",
501
+ "model.layers.58.self_attn.q_proj.weight": "model-00034-of-00051.safetensors",
502
+ "model.layers.58.self_attn.v_proj.weight": "model-00034-of-00051.safetensors",
503
+ "model.layers.59.input_layernorm.weight": "model-00035-of-00051.safetensors",
504
+ "model.layers.59.mlp.down_proj.weight": "model-00035-of-00051.safetensors",
505
+ "model.layers.59.mlp.gate_proj.weight": "model-00035-of-00051.safetensors",
506
+ "model.layers.59.mlp.up_proj.weight": "model-00035-of-00051.safetensors",
507
+ "model.layers.59.post_attention_layernorm.weight": "model-00035-of-00051.safetensors",
508
+ "model.layers.59.self_attn.k_proj.weight": "model-00034-of-00051.safetensors",
509
+ "model.layers.59.self_attn.o_proj.weight": "model-00034-of-00051.safetensors",
510
+ "model.layers.59.self_attn.q_proj.weight": "model-00034-of-00051.safetensors",
511
+ "model.layers.59.self_attn.v_proj.weight": "model-00034-of-00051.safetensors",
512
+ "model.layers.6.input_layernorm.weight": "model-00005-of-00051.safetensors",
513
+ "model.layers.6.mlp.down_proj.weight": "model-00005-of-00051.safetensors",
514
+ "model.layers.6.mlp.gate_proj.weight": "model-00004-of-00051.safetensors",
515
+ "model.layers.6.mlp.up_proj.weight": "model-00004-of-00051.safetensors",
516
+ "model.layers.6.post_attention_layernorm.weight": "model-00005-of-00051.safetensors",
517
+ "model.layers.6.self_attn.k_proj.weight": "model-00004-of-00051.safetensors",
518
+ "model.layers.6.self_attn.o_proj.weight": "model-00004-of-00051.safetensors",
519
+ "model.layers.6.self_attn.q_proj.weight": "model-00004-of-00051.safetensors",
520
+ "model.layers.6.self_attn.v_proj.weight": "model-00004-of-00051.safetensors",
521
+ "model.layers.60.input_layernorm.weight": "model-00035-of-00051.safetensors",
522
+ "model.layers.60.mlp.down_proj.weight": "model-00035-of-00051.safetensors",
523
+ "model.layers.60.mlp.gate_proj.weight": "model-00035-of-00051.safetensors",
524
+ "model.layers.60.mlp.up_proj.weight": "model-00035-of-00051.safetensors",
525
+ "model.layers.60.post_attention_layernorm.weight": "model-00035-of-00051.safetensors",
526
+ "model.layers.60.self_attn.k_proj.weight": "model-00035-of-00051.safetensors",
527
+ "model.layers.60.self_attn.o_proj.weight": "model-00035-of-00051.safetensors",
528
+ "model.layers.60.self_attn.q_proj.weight": "model-00035-of-00051.safetensors",
529
+ "model.layers.60.self_attn.v_proj.weight": "model-00035-of-00051.safetensors",
530
+ "model.layers.61.input_layernorm.weight": "model-00036-of-00051.safetensors",
531
+ "model.layers.61.mlp.down_proj.weight": "model-00036-of-00051.safetensors",
532
+ "model.layers.61.mlp.gate_proj.weight": "model-00036-of-00051.safetensors",
533
+ "model.layers.61.mlp.up_proj.weight": "model-00036-of-00051.safetensors",
534
+ "model.layers.61.post_attention_layernorm.weight": "model-00036-of-00051.safetensors",
535
+ "model.layers.61.self_attn.k_proj.weight": "model-00036-of-00051.safetensors",
536
+ "model.layers.61.self_attn.o_proj.weight": "model-00036-of-00051.safetensors",
537
+ "model.layers.61.self_attn.q_proj.weight": "model-00036-of-00051.safetensors",
538
+ "model.layers.61.self_attn.v_proj.weight": "model-00036-of-00051.safetensors",
539
+ "model.layers.62.input_layernorm.weight": "model-00037-of-00051.safetensors",
540
+ "model.layers.62.mlp.down_proj.weight": "model-00037-of-00051.safetensors",
541
+ "model.layers.62.mlp.gate_proj.weight": "model-00036-of-00051.safetensors",
542
+ "model.layers.62.mlp.up_proj.weight": "model-00036-of-00051.safetensors",
543
+ "model.layers.62.post_attention_layernorm.weight": "model-00037-of-00051.safetensors",
544
+ "model.layers.62.self_attn.k_proj.weight": "model-00036-of-00051.safetensors",
545
+ "model.layers.62.self_attn.o_proj.weight": "model-00036-of-00051.safetensors",
546
+ "model.layers.62.self_attn.q_proj.weight": "model-00036-of-00051.safetensors",
547
+ "model.layers.62.self_attn.v_proj.weight": "model-00036-of-00051.safetensors",
548
+ "model.layers.63.input_layernorm.weight": "model-00037-of-00051.safetensors",
549
+ "model.layers.63.mlp.down_proj.weight": "model-00037-of-00051.safetensors",
550
+ "model.layers.63.mlp.gate_proj.weight": "model-00037-of-00051.safetensors",
551
+ "model.layers.63.mlp.up_proj.weight": "model-00037-of-00051.safetensors",
552
+ "model.layers.63.post_attention_layernorm.weight": "model-00037-of-00051.safetensors",
553
+ "model.layers.63.self_attn.k_proj.weight": "model-00037-of-00051.safetensors",
554
+ "model.layers.63.self_attn.o_proj.weight": "model-00037-of-00051.safetensors",
555
+ "model.layers.63.self_attn.q_proj.weight": "model-00037-of-00051.safetensors",
556
+ "model.layers.63.self_attn.v_proj.weight": "model-00037-of-00051.safetensors",
557
+ "model.layers.64.input_layernorm.weight": "model-00038-of-00051.safetensors",
558
+ "model.layers.64.mlp.down_proj.weight": "model-00038-of-00051.safetensors",
559
+ "model.layers.64.mlp.gate_proj.weight": "model-00037-of-00051.safetensors",
560
+ "model.layers.64.mlp.up_proj.weight": "model-00038-of-00051.safetensors",
561
+ "model.layers.64.post_attention_layernorm.weight": "model-00038-of-00051.safetensors",
562
+ "model.layers.64.self_attn.k_proj.weight": "model-00037-of-00051.safetensors",
563
+ "model.layers.64.self_attn.o_proj.weight": "model-00037-of-00051.safetensors",
564
+ "model.layers.64.self_attn.q_proj.weight": "model-00037-of-00051.safetensors",
565
+ "model.layers.64.self_attn.v_proj.weight": "model-00037-of-00051.safetensors",
566
+ "model.layers.65.input_layernorm.weight": "model-00038-of-00051.safetensors",
567
+ "model.layers.65.mlp.down_proj.weight": "model-00038-of-00051.safetensors",
568
+ "model.layers.65.mlp.gate_proj.weight": "model-00038-of-00051.safetensors",
569
+ "model.layers.65.mlp.up_proj.weight": "model-00038-of-00051.safetensors",
570
+ "model.layers.65.post_attention_layernorm.weight": "model-00038-of-00051.safetensors",
571
+ "model.layers.65.self_attn.k_proj.weight": "model-00038-of-00051.safetensors",
572
+ "model.layers.65.self_attn.o_proj.weight": "model-00038-of-00051.safetensors",
573
+ "model.layers.65.self_attn.q_proj.weight": "model-00038-of-00051.safetensors",
574
+ "model.layers.65.self_attn.v_proj.weight": "model-00038-of-00051.safetensors",
575
+ "model.layers.66.input_layernorm.weight": "model-00039-of-00051.safetensors",
576
+ "model.layers.66.mlp.down_proj.weight": "model-00039-of-00051.safetensors",
577
+ "model.layers.66.mlp.gate_proj.weight": "model-00039-of-00051.safetensors",
578
+ "model.layers.66.mlp.up_proj.weight": "model-00039-of-00051.safetensors",
579
+ "model.layers.66.post_attention_layernorm.weight": "model-00039-of-00051.safetensors",
580
+ "model.layers.66.self_attn.k_proj.weight": "model-00038-of-00051.safetensors",
581
+ "model.layers.66.self_attn.o_proj.weight": "model-00038-of-00051.safetensors",
582
+ "model.layers.66.self_attn.q_proj.weight": "model-00038-of-00051.safetensors",
583
+ "model.layers.66.self_attn.v_proj.weight": "model-00038-of-00051.safetensors",
584
+ "model.layers.67.input_layernorm.weight": "model-00039-of-00051.safetensors",
585
+ "model.layers.67.mlp.down_proj.weight": "model-00039-of-00051.safetensors",
586
+ "model.layers.67.mlp.gate_proj.weight": "model-00039-of-00051.safetensors",
587
+ "model.layers.67.mlp.up_proj.weight": "model-00039-of-00051.safetensors",
588
+ "model.layers.67.post_attention_layernorm.weight": "model-00039-of-00051.safetensors",
589
+ "model.layers.67.self_attn.k_proj.weight": "model-00039-of-00051.safetensors",
590
+ "model.layers.67.self_attn.o_proj.weight": "model-00039-of-00051.safetensors",
591
+ "model.layers.67.self_attn.q_proj.weight": "model-00039-of-00051.safetensors",
592
+ "model.layers.67.self_attn.v_proj.weight": "model-00039-of-00051.safetensors",
593
+ "model.layers.68.input_layernorm.weight": "model-00040-of-00051.safetensors",
594
+ "model.layers.68.mlp.down_proj.weight": "model-00040-of-00051.safetensors",
595
+ "model.layers.68.mlp.gate_proj.weight": "model-00040-of-00051.safetensors",
596
+ "model.layers.68.mlp.up_proj.weight": "model-00040-of-00051.safetensors",
597
+ "model.layers.68.post_attention_layernorm.weight": "model-00040-of-00051.safetensors",
598
+ "model.layers.68.self_attn.k_proj.weight": "model-00040-of-00051.safetensors",
599
+ "model.layers.68.self_attn.o_proj.weight": "model-00040-of-00051.safetensors",
600
+ "model.layers.68.self_attn.q_proj.weight": "model-00040-of-00051.safetensors",
601
+ "model.layers.68.self_attn.v_proj.weight": "model-00040-of-00051.safetensors",
602
+ "model.layers.69.input_layernorm.weight": "model-00041-of-00051.safetensors",
603
+ "model.layers.69.mlp.down_proj.weight": "model-00041-of-00051.safetensors",
604
+ "model.layers.69.mlp.gate_proj.weight": "model-00040-of-00051.safetensors",
605
+ "model.layers.69.mlp.up_proj.weight": "model-00040-of-00051.safetensors",
606
+ "model.layers.69.post_attention_layernorm.weight": "model-00041-of-00051.safetensors",
607
+ "model.layers.69.self_attn.k_proj.weight": "model-00040-of-00051.safetensors",
608
+ "model.layers.69.self_attn.o_proj.weight": "model-00040-of-00051.safetensors",
609
+ "model.layers.69.self_attn.q_proj.weight": "model-00040-of-00051.safetensors",
610
+ "model.layers.69.self_attn.v_proj.weight": "model-00040-of-00051.safetensors",
611
+ "model.layers.7.input_layernorm.weight": "model-00005-of-00051.safetensors",
612
+ "model.layers.7.mlp.down_proj.weight": "model-00005-of-00051.safetensors",
613
+ "model.layers.7.mlp.gate_proj.weight": "model-00005-of-00051.safetensors",
614
+ "model.layers.7.mlp.up_proj.weight": "model-00005-of-00051.safetensors",
615
+ "model.layers.7.post_attention_layernorm.weight": "model-00005-of-00051.safetensors",
616
+ "model.layers.7.self_attn.k_proj.weight": "model-00005-of-00051.safetensors",
617
+ "model.layers.7.self_attn.o_proj.weight": "model-00005-of-00051.safetensors",
618
+ "model.layers.7.self_attn.q_proj.weight": "model-00005-of-00051.safetensors",
619
+ "model.layers.7.self_attn.v_proj.weight": "model-00005-of-00051.safetensors",
620
+ "model.layers.70.input_layernorm.weight": "model-00041-of-00051.safetensors",
621
+ "model.layers.70.mlp.down_proj.weight": "model-00041-of-00051.safetensors",
622
+ "model.layers.70.mlp.gate_proj.weight": "model-00041-of-00051.safetensors",
623
+ "model.layers.70.mlp.up_proj.weight": "model-00041-of-00051.safetensors",
624
+ "model.layers.70.post_attention_layernorm.weight": "model-00041-of-00051.safetensors",
625
+ "model.layers.70.self_attn.k_proj.weight": "model-00041-of-00051.safetensors",
626
+ "model.layers.70.self_attn.o_proj.weight": "model-00041-of-00051.safetensors",
627
+ "model.layers.70.self_attn.q_proj.weight": "model-00041-of-00051.safetensors",
628
+ "model.layers.70.self_attn.v_proj.weight": "model-00041-of-00051.safetensors",
629
+ "model.layers.71.input_layernorm.weight": "model-00042-of-00051.safetensors",
630
+ "model.layers.71.mlp.down_proj.weight": "model-00042-of-00051.safetensors",
631
+ "model.layers.71.mlp.gate_proj.weight": "model-00041-of-00051.safetensors",
632
+ "model.layers.71.mlp.up_proj.weight": "model-00042-of-00051.safetensors",
633
+ "model.layers.71.post_attention_layernorm.weight": "model-00042-of-00051.safetensors",
634
+ "model.layers.71.self_attn.k_proj.weight": "model-00041-of-00051.safetensors",
635
+ "model.layers.71.self_attn.o_proj.weight": "model-00041-of-00051.safetensors",
636
+ "model.layers.71.self_attn.q_proj.weight": "model-00041-of-00051.safetensors",
637
+ "model.layers.71.self_attn.v_proj.weight": "model-00041-of-00051.safetensors",
638
+ "model.layers.72.input_layernorm.weight": "model-00042-of-00051.safetensors",
639
+ "model.layers.72.mlp.down_proj.weight": "model-00042-of-00051.safetensors",
640
+ "model.layers.72.mlp.gate_proj.weight": "model-00042-of-00051.safetensors",
641
+ "model.layers.72.mlp.up_proj.weight": "model-00042-of-00051.safetensors",
642
+ "model.layers.72.post_attention_layernorm.weight": "model-00042-of-00051.safetensors",
643
+ "model.layers.72.self_attn.k_proj.weight": "model-00042-of-00051.safetensors",
644
+ "model.layers.72.self_attn.o_proj.weight": "model-00042-of-00051.safetensors",
645
+ "model.layers.72.self_attn.q_proj.weight": "model-00042-of-00051.safetensors",
646
+ "model.layers.72.self_attn.v_proj.weight": "model-00042-of-00051.safetensors",
647
+ "model.layers.73.input_layernorm.weight": "model-00043-of-00051.safetensors",
648
+ "model.layers.73.mlp.down_proj.weight": "model-00043-of-00051.safetensors",
649
+ "model.layers.73.mlp.gate_proj.weight": "model-00043-of-00051.safetensors",
650
+ "model.layers.73.mlp.up_proj.weight": "model-00043-of-00051.safetensors",
651
+ "model.layers.73.post_attention_layernorm.weight": "model-00043-of-00051.safetensors",
652
+ "model.layers.73.self_attn.k_proj.weight": "model-00042-of-00051.safetensors",
653
+ "model.layers.73.self_attn.o_proj.weight": "model-00042-of-00051.safetensors",
654
+ "model.layers.73.self_attn.q_proj.weight": "model-00042-of-00051.safetensors",
655
+ "model.layers.73.self_attn.v_proj.weight": "model-00042-of-00051.safetensors",
656
+ "model.layers.74.input_layernorm.weight": "model-00043-of-00051.safetensors",
657
+ "model.layers.74.mlp.down_proj.weight": "model-00043-of-00051.safetensors",
658
+ "model.layers.74.mlp.gate_proj.weight": "model-00043-of-00051.safetensors",
659
+ "model.layers.74.mlp.up_proj.weight": "model-00043-of-00051.safetensors",
660
+ "model.layers.74.post_attention_layernorm.weight": "model-00043-of-00051.safetensors",
661
+ "model.layers.74.self_attn.k_proj.weight": "model-00043-of-00051.safetensors",
662
+ "model.layers.74.self_attn.o_proj.weight": "model-00043-of-00051.safetensors",
663
+ "model.layers.74.self_attn.q_proj.weight": "model-00043-of-00051.safetensors",
664
+ "model.layers.74.self_attn.v_proj.weight": "model-00043-of-00051.safetensors",
665
+ "model.layers.75.input_layernorm.weight": "model-00044-of-00051.safetensors",
666
+ "model.layers.75.mlp.down_proj.weight": "model-00044-of-00051.safetensors",
667
+ "model.layers.75.mlp.gate_proj.weight": "model-00044-of-00051.safetensors",
668
+ "model.layers.75.mlp.up_proj.weight": "model-00044-of-00051.safetensors",
669
+ "model.layers.75.post_attention_layernorm.weight": "model-00044-of-00051.safetensors",
670
+ "model.layers.75.self_attn.k_proj.weight": "model-00044-of-00051.safetensors",
671
+ "model.layers.75.self_attn.o_proj.weight": "model-00044-of-00051.safetensors",
672
+ "model.layers.75.self_attn.q_proj.weight": "model-00044-of-00051.safetensors",
673
+ "model.layers.75.self_attn.v_proj.weight": "model-00044-of-00051.safetensors",
674
+ "model.layers.76.input_layernorm.weight": "model-00045-of-00051.safetensors",
675
+ "model.layers.76.mlp.down_proj.weight": "model-00045-of-00051.safetensors",
676
+ "model.layers.76.mlp.gate_proj.weight": "model-00044-of-00051.safetensors",
677
+ "model.layers.76.mlp.up_proj.weight": "model-00044-of-00051.safetensors",
678
+ "model.layers.76.post_attention_layernorm.weight": "model-00045-of-00051.safetensors",
679
+ "model.layers.76.self_attn.k_proj.weight": "model-00044-of-00051.safetensors",
680
+ "model.layers.76.self_attn.o_proj.weight": "model-00044-of-00051.safetensors",
681
+ "model.layers.76.self_attn.q_proj.weight": "model-00044-of-00051.safetensors",
682
+ "model.layers.76.self_attn.v_proj.weight": "model-00044-of-00051.safetensors",
683
+ "model.layers.77.input_layernorm.weight": "model-00045-of-00051.safetensors",
684
+ "model.layers.77.mlp.down_proj.weight": "model-00045-of-00051.safetensors",
685
+ "model.layers.77.mlp.gate_proj.weight": "model-00045-of-00051.safetensors",
686
+ "model.layers.77.mlp.up_proj.weight": "model-00045-of-00051.safetensors",
687
+ "model.layers.77.post_attention_layernorm.weight": "model-00045-of-00051.safetensors",
688
+ "model.layers.77.self_attn.k_proj.weight": "model-00045-of-00051.safetensors",
689
+ "model.layers.77.self_attn.o_proj.weight": "model-00045-of-00051.safetensors",
690
+ "model.layers.77.self_attn.q_proj.weight": "model-00045-of-00051.safetensors",
691
+ "model.layers.77.self_attn.v_proj.weight": "model-00045-of-00051.safetensors",
692
+ "model.layers.78.input_layernorm.weight": "model-00046-of-00051.safetensors",
693
+ "model.layers.78.mlp.down_proj.weight": "model-00046-of-00051.safetensors",
694
+ "model.layers.78.mlp.gate_proj.weight": "model-00045-of-00051.safetensors",
695
+ "model.layers.78.mlp.up_proj.weight": "model-00046-of-00051.safetensors",
696
+ "model.layers.78.post_attention_layernorm.weight": "model-00046-of-00051.safetensors",
697
+ "model.layers.78.self_attn.k_proj.weight": "model-00045-of-00051.safetensors",
698
+ "model.layers.78.self_attn.o_proj.weight": "model-00045-of-00051.safetensors",
699
+ "model.layers.78.self_attn.q_proj.weight": "model-00045-of-00051.safetensors",
700
+ "model.layers.78.self_attn.v_proj.weight": "model-00045-of-00051.safetensors",
701
+ "model.layers.79.input_layernorm.weight": "model-00046-of-00051.safetensors",
702
+ "model.layers.79.mlp.down_proj.weight": "model-00046-of-00051.safetensors",
703
+ "model.layers.79.mlp.gate_proj.weight": "model-00046-of-00051.safetensors",
704
+ "model.layers.79.mlp.up_proj.weight": "model-00046-of-00051.safetensors",
705
+ "model.layers.79.post_attention_layernorm.weight": "model-00046-of-00051.safetensors",
706
+ "model.layers.79.self_attn.k_proj.weight": "model-00046-of-00051.safetensors",
707
+ "model.layers.79.self_attn.o_proj.weight": "model-00046-of-00051.safetensors",
708
+ "model.layers.79.self_attn.q_proj.weight": "model-00046-of-00051.safetensors",
709
+ "model.layers.79.self_attn.v_proj.weight": "model-00046-of-00051.safetensors",
710
+ "model.layers.8.input_layernorm.weight": "model-00006-of-00051.safetensors",
711
+ "model.layers.8.mlp.down_proj.weight": "model-00006-of-00051.safetensors",
712
+ "model.layers.8.mlp.gate_proj.weight": "model-00005-of-00051.safetensors",
713
+ "model.layers.8.mlp.up_proj.weight": "model-00006-of-00051.safetensors",
714
+ "model.layers.8.post_attention_layernorm.weight": "model-00006-of-00051.safetensors",
715
+ "model.layers.8.self_attn.k_proj.weight": "model-00005-of-00051.safetensors",
716
+ "model.layers.8.self_attn.o_proj.weight": "model-00005-of-00051.safetensors",
717
+ "model.layers.8.self_attn.q_proj.weight": "model-00005-of-00051.safetensors",
718
+ "model.layers.8.self_attn.v_proj.weight": "model-00005-of-00051.safetensors",
719
+ "model.layers.80.input_layernorm.weight": "model-00047-of-00051.safetensors",
720
+ "model.layers.80.mlp.down_proj.weight": "model-00047-of-00051.safetensors",
721
+ "model.layers.80.mlp.gate_proj.weight": "model-00047-of-00051.safetensors",
722
+ "model.layers.80.mlp.up_proj.weight": "model-00047-of-00051.safetensors",
723
+ "model.layers.80.post_attention_layernorm.weight": "model-00047-of-00051.safetensors",
724
+ "model.layers.80.self_attn.k_proj.weight": "model-00046-of-00051.safetensors",
725
+ "model.layers.80.self_attn.o_proj.weight": "model-00046-of-00051.safetensors",
726
+ "model.layers.80.self_attn.q_proj.weight": "model-00046-of-00051.safetensors",
727
+ "model.layers.80.self_attn.v_proj.weight": "model-00046-of-00051.safetensors",
728
+ "model.layers.81.input_layernorm.weight": "model-00047-of-00051.safetensors",
729
+ "model.layers.81.mlp.down_proj.weight": "model-00047-of-00051.safetensors",
730
+ "model.layers.81.mlp.gate_proj.weight": "model-00047-of-00051.safetensors",
731
+ "model.layers.81.mlp.up_proj.weight": "model-00047-of-00051.safetensors",
732
+ "model.layers.81.post_attention_layernorm.weight": "model-00047-of-00051.safetensors",
733
+ "model.layers.81.self_attn.k_proj.weight": "model-00047-of-00051.safetensors",
734
+ "model.layers.81.self_attn.o_proj.weight": "model-00047-of-00051.safetensors",
735
+ "model.layers.81.self_attn.q_proj.weight": "model-00047-of-00051.safetensors",
736
+ "model.layers.81.self_attn.v_proj.weight": "model-00047-of-00051.safetensors",
737
+ "model.layers.82.input_layernorm.weight": "model-00048-of-00051.safetensors",
738
+ "model.layers.82.mlp.down_proj.weight": "model-00048-of-00051.safetensors",
739
+ "model.layers.82.mlp.gate_proj.weight": "model-00048-of-00051.safetensors",
740
+ "model.layers.82.mlp.up_proj.weight": "model-00048-of-00051.safetensors",
741
+ "model.layers.82.post_attention_layernorm.weight": "model-00048-of-00051.safetensors",
742
+ "model.layers.82.self_attn.k_proj.weight": "model-00048-of-00051.safetensors",
743
+ "model.layers.82.self_attn.o_proj.weight": "model-00048-of-00051.safetensors",
744
+ "model.layers.82.self_attn.q_proj.weight": "model-00048-of-00051.safetensors",
745
+ "model.layers.82.self_attn.v_proj.weight": "model-00048-of-00051.safetensors",
746
+ "model.layers.83.input_layernorm.weight": "model-00049-of-00051.safetensors",
747
+ "model.layers.83.mlp.down_proj.weight": "model-00049-of-00051.safetensors",
748
+ "model.layers.83.mlp.gate_proj.weight": "model-00048-of-00051.safetensors",
749
+ "model.layers.83.mlp.up_proj.weight": "model-00048-of-00051.safetensors",
750
+ "model.layers.83.post_attention_layernorm.weight": "model-00049-of-00051.safetensors",
751
+ "model.layers.83.self_attn.k_proj.weight": "model-00048-of-00051.safetensors",
752
+ "model.layers.83.self_attn.o_proj.weight": "model-00048-of-00051.safetensors",
753
+ "model.layers.83.self_attn.q_proj.weight": "model-00048-of-00051.safetensors",
754
+ "model.layers.83.self_attn.v_proj.weight": "model-00048-of-00051.safetensors",
755
+ "model.layers.84.input_layernorm.weight": "model-00049-of-00051.safetensors",
756
+ "model.layers.84.mlp.down_proj.weight": "model-00049-of-00051.safetensors",
757
+ "model.layers.84.mlp.gate_proj.weight": "model-00049-of-00051.safetensors",
758
+ "model.layers.84.mlp.up_proj.weight": "model-00049-of-00051.safetensors",
759
+ "model.layers.84.post_attention_layernorm.weight": "model-00049-of-00051.safetensors",
760
+ "model.layers.84.self_attn.k_proj.weight": "model-00049-of-00051.safetensors",
761
+ "model.layers.84.self_attn.o_proj.weight": "model-00049-of-00051.safetensors",
762
+ "model.layers.84.self_attn.q_proj.weight": "model-00049-of-00051.safetensors",
763
+ "model.layers.84.self_attn.v_proj.weight": "model-00049-of-00051.safetensors",
764
+ "model.layers.85.input_layernorm.weight": "model-00050-of-00051.safetensors",
765
+ "model.layers.85.mlp.down_proj.weight": "model-00050-of-00051.safetensors",
766
+ "model.layers.85.mlp.gate_proj.weight": "model-00049-of-00051.safetensors",
767
+ "model.layers.85.mlp.up_proj.weight": "model-00050-of-00051.safetensors",
768
+ "model.layers.85.post_attention_layernorm.weight": "model-00050-of-00051.safetensors",
769
+ "model.layers.85.self_attn.k_proj.weight": "model-00049-of-00051.safetensors",
770
+ "model.layers.85.self_attn.o_proj.weight": "model-00049-of-00051.safetensors",
771
+ "model.layers.85.self_attn.q_proj.weight": "model-00049-of-00051.safetensors",
772
+ "model.layers.85.self_attn.v_proj.weight": "model-00049-of-00051.safetensors",
773
+ "model.layers.86.input_layernorm.weight": "model-00050-of-00051.safetensors",
774
+ "model.layers.86.mlp.down_proj.weight": "model-00050-of-00051.safetensors",
775
+ "model.layers.86.mlp.gate_proj.weight": "model-00050-of-00051.safetensors",
776
+ "model.layers.86.mlp.up_proj.weight": "model-00050-of-00051.safetensors",
777
+ "model.layers.86.post_attention_layernorm.weight": "model-00050-of-00051.safetensors",
778
+ "model.layers.86.self_attn.k_proj.weight": "model-00050-of-00051.safetensors",
779
+ "model.layers.86.self_attn.o_proj.weight": "model-00050-of-00051.safetensors",
780
+ "model.layers.86.self_attn.q_proj.weight": "model-00050-of-00051.safetensors",
781
+ "model.layers.86.self_attn.v_proj.weight": "model-00050-of-00051.safetensors",
782
+ "model.layers.87.input_layernorm.weight": "model-00051-of-00051.safetensors",
783
+ "model.layers.87.mlp.down_proj.weight": "model-00051-of-00051.safetensors",
784
+ "model.layers.87.mlp.gate_proj.weight": "model-00051-of-00051.safetensors",
785
+ "model.layers.87.mlp.up_proj.weight": "model-00051-of-00051.safetensors",
786
+ "model.layers.87.post_attention_layernorm.weight": "model-00051-of-00051.safetensors",
787
+ "model.layers.87.self_attn.k_proj.weight": "model-00050-of-00051.safetensors",
788
+ "model.layers.87.self_attn.o_proj.weight": "model-00050-of-00051.safetensors",
789
+ "model.layers.87.self_attn.q_proj.weight": "model-00050-of-00051.safetensors",
790
+ "model.layers.87.self_attn.v_proj.weight": "model-00050-of-00051.safetensors",
791
+ "model.layers.9.input_layernorm.weight": "model-00006-of-00051.safetensors",
792
+ "model.layers.9.mlp.down_proj.weight": "model-00006-of-00051.safetensors",
793
+ "model.layers.9.mlp.gate_proj.weight": "model-00006-of-00051.safetensors",
794
+ "model.layers.9.mlp.up_proj.weight": "model-00006-of-00051.safetensors",
795
+ "model.layers.9.post_attention_layernorm.weight": "model-00006-of-00051.safetensors",
796
+ "model.layers.9.self_attn.k_proj.weight": "model-00006-of-00051.safetensors",
797
+ "model.layers.9.self_attn.o_proj.weight": "model-00006-of-00051.safetensors",
798
+ "model.layers.9.self_attn.q_proj.weight": "model-00006-of-00051.safetensors",
799
+ "model.layers.9.self_attn.v_proj.weight": "model-00006-of-00051.safetensors",
800
+ "model.norm.weight": "model-00051-of-00051.safetensors"
801
+ }
802
+ }
output-00001-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c5f570ca1405aa967b3ab21d627b78faa5d168e95f3649f52bad001c2e47400
3
+ size 8580884406
output-00002-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ccb6aebe39594961896070c9aae89d3799fa8bc64ec03f95e4f51af99ae3616
3
+ size 8472372554
output-00003-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09c0536f64e6ef573a7e9767470631104c75ee28c371ba133e8df2faa9d60123
3
+ size 8534428400
output-00004-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:911b6263a73a247d3b73e2e089d3c04e191fe625cc23fbd2646b80f3bb808f27
3
+ size 8554711738
output-00005-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:edb9d0cba857adc6ca5a4961a274da8a59b1f650febdfb7a4279b51722d9b082
3
+ size 8575425796
output-00006-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a9ccfc8935c516110de49c33b7e03718f84983974b7175ff17dbfbcec49c11f
3
+ size 4085480808
params.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dim": 12288,
3
+ "n_layers": 88,
4
+ "head_dim": 128,
5
+ "hidden_dim": 28672,
6
+ "n_heads": 96,
7
+ "n_kv_heads": 8,
8
+ "norm_eps": 1e-05,
9
+ "vocab_size": 32768,
10
+ "rope_theta": 1000000.0
11
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b968b8dc352f42192367337c78ccc61e1eaddc6d641a579372d4f20694beb7a
3
+ size 587562
tokenizer.model.v7 ADDED
Binary file (588 kB). View file
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff