bartowski commited on
Commit
1fc93df
1 Parent(s): bac8c29

Quant for 8.0

Browse files
README.md CHANGED
@@ -20,62 +20,316 @@ language:
20
  - en
21
  library_name: transformers
22
  pipeline_tag: text-generation
23
- quantized_by: bartowski
24
  ---
 
 
25
 
26
- ## Exllama v2 Quantizations of AlphaMonarch-laser
27
 
28
- Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.0.14">turboderp's ExLlamaV2 v0.0.14</a> for quantization.
29
 
30
- <b>The "main" branch only contains the measurement.json, download one of the other branches for the model (see below)</b>
31
 
32
- Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.
 
33
 
34
- Original model: https://huggingface.co/abideen/AlphaMonarch-laser
35
 
36
- | Branch | Bits | lm_head bits | VRAM (4k) | VRAM (16k) | VRAM (32k) | Description |
37
- | ----- | ---- | ------- | ------ | ------ | ------ | ------------ |
38
- | [8_0](https://huggingface.co/bartowski/AlphaMonarch-laser-exl2/tree/8_0) | 8.0 | 8.0 | 8.4 GB | 9.8 GB | 11.8 GB | Maximum quality that ExLlamaV2 can produce, near unquantized performance. |
39
- | [6_5](https://huggingface.co/bartowski/AlphaMonarch-laser-exl2/tree/6_5) | 6.5 | 8.0 | 7.2 GB | 8.6 GB | 10.6 GB | Very similar to 8.0, good tradeoff of size vs performance, **recommended**. |
40
- | [5_0](https://huggingface.co/bartowski/AlphaMonarch-laser-exl2/tree/5_0) | 5.0 | 6.0 | 6.0 GB | 7.4 GB | 9.4 GB | Slightly lower quality vs 6.5, but usable on 8GB cards. |
41
- | [4_25](https://huggingface.co/bartowski/AlphaMonarch-laser-exl2/tree/4_25) | 4.25 | 6.0 | 5.3 GB | 6.7 GB | 8.7 GB | GPTQ equivalent bits per weight, slightly higher quality. |
42
- | [3_5](https://huggingface.co/bartowski/AlphaMonarch-laser-exl2/tree/3_5) | 3.5 | 6.0 | 4.7 GB | 6.1 GB | 8.1 GB | Lower quality, only use if you have to. |
43
 
44
- ## Download instructions
45
 
46
- With git:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
- ```shell
49
- git clone --single-branch --branch 6_5 https://huggingface.co/bartowski/AlphaMonarch-laser-exl2 AlphaMonarch-laser-exl2-6_5
50
- ```
51
 
52
- With huggingface hub (credit to TheBloke for instructions):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
- ```shell
55
- pip3 install huggingface-hub
56
- ```
57
 
58
- To download the `main` (only useful if you only care about measurement.json) branch to a folder called `AlphaMonarch-laser-exl2`:
 
 
 
 
59
 
60
- ```shell
61
- mkdir AlphaMonarch-laser-exl2
62
- huggingface-cli download bartowski/AlphaMonarch-laser-exl2 --local-dir AlphaMonarch-laser-exl2 --local-dir-use-symlinks False
63
- ```
64
 
65
- To download from a different branch, add the `--revision` parameter:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
- Linux:
68
 
69
- ```shell
70
- mkdir AlphaMonarch-laser-exl2-6_5
71
- huggingface-cli download bartowski/AlphaMonarch-laser-exl2 --revision 6_5 --local-dir AlphaMonarch-laser-exl2-6_5 --local-dir-use-symlinks False
72
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
- Windows (which apparently doesn't like _ in folders sometimes?):
75
 
76
- ```shell
77
- mkdir AlphaMonarch-laser-exl2-6.5
78
- huggingface-cli download bartowski/AlphaMonarch-laser-exl2 --revision 6_5 --local-dir AlphaMonarch-laser-exl2-6.5 --local-dir-use-symlinks False
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  ```
80
 
81
- Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski
 
 
 
 
 
 
 
 
 
 
20
  - en
21
  library_name: transformers
22
  pipeline_tag: text-generation
 
23
  ---
24
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
25
+ should probably proofread and complete it, then remove this comment. -->
26
 
27
+ # AlphaMonarch-laser
28
 
29
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/62S_ExHO6NKCM3NhPDrds.jpeg)
30
 
31
+ AlphaMonarch-laser is a DPO fine-tuned of [mlabonne/NeuralMonarch-7B](https://huggingface.co/mlabonne/NeuralMonarch-7B/) using the [argilla/OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/argilla/OpenHermes2.5-dpo-binarized-alpha) preference dataset but achieves better performance then [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B/) using LaserQLoRA. We have fine-tuned this model only on half of the projections, but have achieved better results as compared to the version released by Maximme Labonne. We have trained this model for 1080 steps.
32
 
33
+ AlphaMonarch-laser is ranking 1 on YALL - [Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
34
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/Jgxw1FZRx7nNAdSh7nYt1.png)
35
 
36
+ ## 🏆 Evaluation results
37
 
38
+ # Nous Benchmark
 
 
 
 
 
 
39
 
40
+ ### AGIEVAL
41
 
42
+ | Task | Version | Metric | Value | StdErr |
43
+ |---------------------------------|---------|--------------|--------|--------|
44
+ | agieval_aqua_rat | 0 | acc | 28.35% | 2.83% |
45
+ | agieval_aqua_rat | 0 | acc_norm | 26.38% | 2.77% |
46
+ | agieval_logiqa_en | 0 | acc | 38.25% | 1.91% |
47
+ | agieval_logiqa_en | 0 | acc_norm | 38.10% | 1.90% |
48
+ | agieval_lsat_ar | 0 | acc | 23.91% | 2.82% |
49
+ | agieval_lsat_ar | 0 | acc_norm | 23.48% | 2.80% |
50
+ | agieval_lsat_lr | 0 | acc | 52.75% | 2.21% |
51
+ | agieval_lsat_lr | 0 | acc_norm | 53.92% | 2.21% |
52
+ | agieval_lsat_rc | 0 | acc | 66.91% | 2.87% |
53
+ | agieval_lsat_rc | 0 | acc_norm | 67.29% | 2.87% |
54
+ | agieval_sat_en | 0 | acc | 78.64% | 2.86% |
55
+ | agieval_sat_en | 0 | acc_norm | 78.64% | 2.86% |
56
+ | agieval_sat_en_without_passage | 0 | acc | 45.15% | 3.48% |
57
+ | agieval_sat_en_without_passage | 0 | acc_norm | 44.17% | 3.47% |
58
+ | agieval_sat_math | 0 | acc | 33.18% | 3.18% |
59
+ | agieval_sat_math | 0 | acc_norm | 31.36% | 3.14% |
60
+ Average: 28.41%
61
 
62
+ ### GPT4ALL
 
 
63
 
64
+ | Task | Version | Metric | Value | StdErr |
65
+ |--------------|---------|----------|-------|--------|
66
+ | arc_challenge| 0 | acc | 66.30%| ± 1.38%|
67
+ | | | acc_norm | 68.26%| ± 1.36%|
68
+ | arc_easy | 0 | acc | 86.57%| ± 0.70%|
69
+ | | | acc_norm | 80.81%| ± 0.81%|
70
+ | boolq | 1 | acc | 87.16%| ± 0.59%|
71
+ | hellaswag | 0 | acc | 69.60%| ± 0.46%|
72
+ | | | acc_norm | 87.45%| ± 0.33%|
73
+ | openbookqa | 0 | acc | 39.20%| ± 2.19%|
74
+ | | | acc_norm | 49.60%| ± 2.24%|
75
+ | piqa | 0 | acc | 83.03%| ± 0.88%|
76
+ | | | acc_norm | 84.87%| ± 0.84%|
77
+ | winogrande | 0 | acc | 81.06%| ± 1.10%|
78
+ Average: 76.98%
79
 
80
+ ### TRUTHFUL-QA
 
 
81
 
82
+ | Task | Version | Metric | Value | StdErr |
83
+ |---------------|---------|--------|-------|--------|
84
+ | truthfulqa_mc | 1 | mc1 | 63.04%| ± 1.69%|
85
+ | truthfulqa_mc | 1 | mc2 | 78.39%| ± 1.37%|
86
+ Average: 70.71%
87
 
88
+ ### BIGBENCH
 
 
 
89
 
90
+ | Task | Version | Metric | Value | StdErr |
91
+ |------------------------------------------------|---------|-----------------------|-------|--------------------|
92
+ | bigbench_causal_judgement | 0 | multiple_choice_grade| 60.00%| ± 3.56% |
93
+ | bigbench_date_understanding | 0 | multiple_choice_grade| 62.06%| ± 2.53% |
94
+ | bigbench_disambiguation_qa | 0 | multiple_choice_grade| 54.26%| ± 3.11% |
95
+ | bigbench_geometric_shapes | 0 | multiple_choice_grade| 23.96%| ± 2.26% |
96
+ | | | exact_str_match | 0.00% | ± 0.00% |
97
+ | bigbench_logical_deduction_five_objects | 0 | multiple_choice_grade| 32.80%| ± 2.10% |
98
+ | bigbench_logical_deduction_seven_objects | 0 | multiple_choice_grade| 23.86%| ± 1.61% |
99
+ | bigbench_logical_deduction_three_objects | 0 | multiple_choice_grade| 59.33%| ± 2.84% |
100
+ | bigbench_movie_recommendation | 0 | multiple_choice_grade| 58.00%| ± 2.21% |
101
+ | bigbench_navigate | 0 | multiple_choice_grade| 56.00%| ± 1.57% |
102
+ | bigbench_reasoning_about_colored_objects | 0 | multiple_choice_grade| 69.20%| ± 1.03% |
103
+ | bigbench_ruin_names | 0 | multiple_choice_grade| 55.36%| ± 2.35% |
104
+ | bigbench_salient_translation_error_detection | 0 | multiple_choice_grade| 41.48%| ± 1.56% |
105
+ | bigbench_snarks | 0 | multiple_choice_grade| 73.48%| ± 3.29% |
106
+ | bigbench_sports_understanding | 0 | multiple_choice_grade| 76.06%| ± 1.36% |
107
+ | bigbench_temporal_sequences | 0 | multiple_choice_grade| 55.50%| ± 1.57% |
108
+ | bigbench_tracking_shuffled_objects_five_objects| 0 | multiple_choice_grade| 23.28%| ± 1.20% |
109
+ | bigbench_tracking_shuffled_objects_seven_objects| 0 | multiple_choice_grade| 19.37%| ± 0.94% |
110
+ | bigbench_tracking_shuffled_objects_three_objects| 0 | multiple_choice_grade| 59.33%| ± 2.84% |
111
+ Average: 55.37%
112
 
113
+ # Openllm Benchmark
114
 
115
+ | Task |Version| Metric |Value| |Stderr|
116
+ |-------------|------:|--------|----:|---|-----:|
117
+ |arc_challenge| 0|acc |70.12|± | 1.30|
118
+ | | |acc_norm|73.27|± | 1.29|
119
+ |hellaswag | 0|acc |71.80|± | 0.44|
120
+ | | |acc_norm|89.20|± | 0.30|
121
+ |gsm8k | 0|acc |66.77|± | 1.2 |
122
+ |winogrande | 0|acc |84.6 |± | 1.0 |
123
+
124
+ Average: 73.5%
125
+
126
+ ### TruthfulQA
127
+ | Task |Version|Metric|Value| |Stderr|
128
+ |-------------|------:|------|----:|---|-----:|
129
+ |truthfulqa_mc| 1|mc1 |62.79|± | 1.69|
130
+ | | |mc2 |77.90|± | 1.37|
131
+
132
+ ### Training hyperparameters
133
+
134
+ The following hyperparameters were used during training:
135
+ - learning_rate: 5e-07
136
+ - train_batch_size: 1
137
+ - eval_batch_size: 8
138
+ - seed: 42
139
+ - gradient_accumulation_steps: 8
140
+ - total_train_batch_size: 8
141
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
142
+ - lr_scheduler_type: cosine
143
+ - lr_scheduler_warmup_steps: 100
144
+ - training_steps: 1080
145
 
 
146
 
147
+
148
+ ### 📝 Axolotl Configuration
149
+
150
+ ```yaml
151
+ base_model: mlabonne/NeuralMonarch-7B
152
+ model_type: MistralForCausalLM
153
+ tokenizer_type: LlamaTokenizer
154
+ is_mistral_derived_model: true
155
+ load_in_8bit: false
156
+ load_in_4bit: true
157
+ strict: false
158
+ rl: dpo
159
+ chat_template: chatml
160
+ datasets:
161
+ - path: mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha
162
+ split: train
163
+ type: chatml.intel
164
+ dataset_prepared_path:
165
+ val_set_size: 0.01
166
+ output_dir: ./out
167
+ adapter: qlora
168
+ lora_model_dir:
169
+ sequence_len: 1800
170
+ sample_packing: false
171
+ pad_to_sequence_len: false
172
+ lora_r: 16
173
+ lora_alpha: 16
174
+ lora_dropout: 0.05
175
+ lora_target_linear: true
176
+ lora_fan_in_fan_out:
177
+ lora_target_modules:
178
+ - layers.1.self_attn.q_proj
179
+ - layers.0.self_attn.q_proj
180
+ - layers.15.self_attn.q_proj
181
+ - layers.12.self_attn.q_proj
182
+ - layers.11.self_attn.q_proj
183
+ - layers.14.self_attn.q_proj
184
+ - layers.9.self_attn.q_proj
185
+ - layers.16.self_attn.q_proj
186
+ - layers.30.self_attn.q_proj
187
+ - layers.18.self_attn.q_proj
188
+ - layers.13.self_attn.q_proj
189
+ - layers.10.self_attn.q_proj
190
+ - layers.7.self_attn.q_proj
191
+ - layers.8.self_attn.q_proj
192
+ - layers.4.self_attn.q_proj
193
+ - layers.19.self_attn.q_proj
194
+ - layers.27.self_attn.k_proj
195
+ - layers.24.self_attn.k_proj
196
+ - layers.25.self_attn.k_proj
197
+ - layers.22.self_attn.k_proj
198
+ - layers.26.self_attn.k_proj
199
+ - layers.29.self_attn.k_proj
200
+ - layers.23.self_attn.k_proj
201
+ - layers.28.self_attn.k_proj
202
+ - layers.21.self_attn.k_proj
203
+ - layers.31.self_attn.k_proj
204
+ - layers.30.self_attn.k_proj
205
+ - layers.20.self_attn.k_proj
206
+ - layers.5.self_attn.k_proj
207
+ - layers.19.self_attn.k_proj
208
+ - layers.17.self_attn.k_proj
209
+ - layers.18.self_attn.k_proj
210
+ - layers.19.self_attn.v_proj
211
+ - layers.24.self_attn.v_proj
212
+ - layers.18.self_attn.v_proj
213
+ - layers.5.self_attn.v_proj
214
+ - layers.3.self_attn.v_proj
215
+ - layers.16.self_attn.v_proj
216
+ - layers.23.self_attn.v_proj
217
+ - layers.27.self_attn.v_proj
218
+ - layers.25.self_attn.v_proj
219
+ - layers.26.self_attn.v_proj
220
+ - layers.20.self_attn.v_proj
221
+ - layers.6.self_attn.v_proj
222
+ - layers.15.self_attn.v_proj
223
+ - layers.17.self_attn.v_proj
224
+ - layers.29.self_attn.v_proj
225
+ - layers.22.self_attn.v_proj
226
+ - layers.12.self_attn.o_proj
227
+ - layers.9.self_attn.o_proj
228
+ - layers.14.self_attn.o_proj
229
+ - layers.0.self_attn.o_proj
230
+ - layers.6.self_attn.o_proj
231
+ - layers.8.self_attn.o_proj
232
+ - layers.10.self_attn.o_proj
233
+ - layers.11.self_attn.o_proj
234
+ - layers.13.self_attn.o_proj
235
+ - layers.24.self_attn.o_proj
236
+ - layers.7.self_attn.o_proj
237
+ - layers.15.self_attn.o_proj
238
+ - layers.5.self_attn.o_proj
239
+ - layers.17.self_attn.o_proj
240
+ - layers.25.self_attn.o_proj
241
+ - layers.4.self_attn.o_proj
242
+ - layers.31.mlp.gate_proj
243
+ - layers.30.mlp.gate_proj
244
+ - layers.4.mlp.gate_proj
245
+ - layers.3.mlp.gate_proj
246
+ - layers.29.mlp.gate_proj
247
+ - layers.28.mlp.gate_proj
248
+ - layers.6.mlp.gate_proj
249
+ - layers.27.mlp.gate_proj
250
+ - layers.5.mlp.gate_proj
251
+ - layers.26.mlp.gate_proj
252
+ - layers.25.mlp.gate_proj
253
+ - layers.7.mlp.gate_proj
254
+ - layers.2.mlp.gate_proj
255
+ - layers.24.mlp.gate_proj
256
+ - layers.23.mlp.gate_proj
257
+ - layers.10.mlp.gate_proj
258
+ - layers.6.mlp.up_proj
259
+ - layers.4.mlp.up_proj
260
+ - layers.5.mlp.up_proj
261
+ - layers.27.mlp.up_proj
262
+ - layers.25.mlp.up_proj
263
+ - layers.26.mlp.up_proj
264
+ - layers.17.mlp.up_proj
265
+ - layers.24.mlp.up_proj
266
+ - layers.7.mlp.up_proj
267
+ - layers.10.mlp.up_proj
268
+ - layers.3.mlp.up_proj
269
+ - layers.11.mlp.up_proj
270
+ - layers.23.mlp.up_proj
271
+ - layers.9.mlp.up_proj
272
+ - layers.14.mlp.up_proj
273
+ - layers.18.mlp.up_proj
274
+ - layers.19.mlp.down_proj
275
+ - layers.20.mlp.down_proj
276
+ - layers.18.mlp.down_proj
277
+ - layers.21.mlp.down_proj
278
+ - layers.29.mlp.down_proj
279
+ - layers.1.mlp.down_proj
280
+ - layers.22.mlp.down_proj
281
+ - layers.28.mlp.down_proj
282
+ - layers.23.mlp.down_proj
283
+ - layers.30.mlp.down_proj
284
+ - layers.17.mlp.down_proj
285
+ - layers.4.mlp.down_proj
286
+ - layers.2.mlp.down_proj
287
+ - layers.15.mlp.down_proj
288
+ - layers.5.mlp.down_proj
289
+ wandb_project: axolotl
290
+ wandb_entity:
291
+ wandb_watch:
292
+ wandb_name:
293
+ wandb_log_model:
294
+ gradient_accumulation_steps: 8
295
+ micro_batch_size: 1
296
+ num_epochs: 1
297
+ optimizer: paged_adamw_32bit
298
+ lr_scheduler: cosine
299
+ learning_rate: 5e-7
300
+ train_on_inputs: false
301
+ group_by_length: false
302
+ bf16: true
303
+ fp16: false
304
+ tf32: true
305
+ gradient_checkpointing: true
306
+ early_stopping_patience:
307
+ resume_from_checkpoint:
308
+ local_rank:
309
+ logging_steps: 1
310
+ xformers_attention:
311
+ flash_attention: true
312
+ warmup_steps: 100
313
+ evals_per_epoch: 1
314
+ eval_table_size:
315
+ eval_table_max_new_tokens: 128
316
+ save_steps: 1080
317
+ max_steps: 1080
318
+ debug:
319
+ deepspeed:
320
+ weight_decay: 0.0
321
+ fsdp:
322
+ fsdp_config:
323
+ special_tokens:
324
  ```
325
 
326
+
327
+ ### Framework versions
328
+
329
+ - Transformers 4.38.0.dev0
330
+ - Pytorch 2.1.2+cu118
331
+ - Datasets 2.17.0
332
+ - Tokenizers 0.15.0
333
+ - axolotl: 0.4.0
334
+
335
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "mlabonne/NeuralMonarch-7B",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 4096,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 14336,
13
+ "max_position_embeddings": 32768,
14
+ "model_type": "mistral",
15
+ "num_attention_heads": 32,
16
+ "num_hidden_layers": 32,
17
+ "num_key_value_heads": 8,
18
+ "rms_norm_eps": 1e-05,
19
+ "rope_theta": 10000.0,
20
+ "sliding_window": 4096,
21
+ "tie_word_embeddings": false,
22
+ "torch_dtype": "float16",
23
+ "transformers_version": "4.38.0.dev0",
24
+ "use_cache": true,
25
+ "vocab_size": 32000
26
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.38.0.dev0"
6
+ }
model.safetensors.index.json ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 14483464192
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00003-of-00003.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00003.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00003.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00003-of-00003.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00003.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
242
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
243
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
244
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
245
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
246
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
247
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
248
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
249
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
250
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
251
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
252
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
253
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
254
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
255
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
256
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
257
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
258
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
259
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
260
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
261
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
262
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
263
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
264
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
265
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
266
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
267
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
268
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
269
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
270
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
271
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
272
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
273
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
274
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
275
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
276
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
277
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
278
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
279
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
280
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
281
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
282
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
283
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
284
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
285
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
286
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
287
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
288
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
289
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
290
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
291
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
292
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
293
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
294
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
295
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
296
+ "model.norm.weight": "model-00003-of-00003.safetensors"
297
+ }
298
+ }
original_repo_url.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/abideen/AlphaMonarch-laser
output.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d0e7491a3607c5499fa766f030c5e6419e3a4c4a6ce0da1b7f0e614beb1a098
3
+ size 7370262600
special_tokens_map.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<unk>",
4
+ "<s>",
5
+ "</s>"
6
+ ],
7
+ "bos_token": {
8
+ "content": "<s>",
9
+ "lstrip": false,
10
+ "normalized": false,
11
+ "rstrip": false,
12
+ "single_word": false
13
+ },
14
+ "eos_token": {
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "pad_token": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false
27
+ },
28
+ "unk_token": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false
34
+ }
35
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [
31
+ "<unk>",
32
+ "<s>",
33
+ "</s>"
34
+ ],
35
+ "bos_token": "<s>",
36
+ "chat_template": "{% for message in messages %}{{bos_token + message['role'] + '\n' + message['content'] + eos_token + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ bos_token + 'assistant\n' }}{% endif %}",
37
+ "clean_up_tokenization_spaces": false,
38
+ "eos_token": "</s>",
39
+ "legacy": true,
40
+ "model_max_length": 8192,
41
+ "pad_token": "</s>",
42
+ "padding_side": "left",
43
+ "sp_model_kwargs": {},
44
+ "spaces_between_special_tokens": false,
45
+ "split_special_tokens": false,
46
+ "tokenizer_class": "LlamaTokenizer",
47
+ "unk_token": "<unk>",
48
+ "use_default_system_prompt": true
49
+ }