root commited on 12 days ago

Commit

ea26a6e

0 Parent(s):

Add MXFP4 quantized model

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +37 -0
README.md +127 -0
chat_template.jinja +117 -0
config.json +834 -0
generation_config.json +12 -0
model-00001-of-00282.safetensors +3 -0
model-00002-of-00282.safetensors +3 -0
model-00003-of-00282.safetensors +3 -0
model-00004-of-00282.safetensors +3 -0
model-00005-of-00282.safetensors +3 -0
model-00006-of-00282.safetensors +3 -0
model-00007-of-00282.safetensors +3 -0
model-00008-of-00282.safetensors +3 -0
model-00009-of-00282.safetensors +3 -0
model-00010-of-00282.safetensors +3 -0
model-00011-of-00282.safetensors +3 -0
model-00012-of-00282.safetensors +3 -0
model-00013-of-00282.safetensors +3 -0
model-00014-of-00282.safetensors +3 -0
model-00015-of-00282.safetensors +3 -0
model-00016-of-00282.safetensors +3 -0
model-00017-of-00282.safetensors +3 -0
model-00018-of-00282.safetensors +3 -0
model-00019-of-00282.safetensors +3 -0
model-00020-of-00282.safetensors +3 -0
model-00021-of-00282.safetensors +3 -0
model-00022-of-00282.safetensors +3 -0
model-00023-of-00282.safetensors +3 -0
model-00024-of-00282.safetensors +3 -0
model-00025-of-00282.safetensors +3 -0
model-00026-of-00282.safetensors +3 -0
model-00027-of-00282.safetensors +3 -0
model-00028-of-00282.safetensors +3 -0
model-00029-of-00282.safetensors +3 -0
model-00030-of-00282.safetensors +3 -0
model-00031-of-00282.safetensors +3 -0
model-00032-of-00282.safetensors +3 -0
model-00033-of-00282.safetensors +3 -0
model-00034-of-00282.safetensors +3 -0
model-00035-of-00282.safetensors +3 -0
model-00036-of-00282.safetensors +3 -0
model-00037-of-00282.safetensors +3 -0
model-00038-of-00282.safetensors +3 -0
model-00039-of-00282.safetensors +3 -0
model-00040-of-00282.safetensors +3 -0
model-00041-of-00282.safetensors +3 -0
model-00042-of-00282.safetensors +3 -0
model-00043-of-00282.safetensors +3 -0
model-00044-of-00282.safetensors +3 -0
model-00045-of-00282.safetensors +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,37 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+*.index.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,127 @@

+---
+license: mit
+base_model:
+- zai-org/GLM-5.1
+---
+# Model Overview
+- **Model Architecture:** GLM-5.1
+  - **Input:** Text
+  - **Output:** Text
+- **Supported Hardware Microarchitecture:** AMD MI350/MI355
+- **ROCm:** 7.0.0
+- **PyTorch:** 2.10.0
+- **Transformers:** 4.57.6
+- **Operating System(s):** Linux
+- **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
+- **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
+  - **Weight quantization:** MOE-only (shared experts quantized), OCP MXFP4, Static
+  - **Activation quantization:** MOE-only, OCP MXFP4, Dynamic
+- **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
+This model was built with GLM-5.1 model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
+# Model Quantization
+The model was quantized from [zai-org/GLM-5.1](https://huggingface.co/zai-org/GLM-5.1) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights and activations are quantized to MXFP4.
+**Quantization scripts:**
+```python
+from quark.torch import LLMTemplate, ModelQuantizer
+# --- Register template ---
+GLM5_template = LLMTemplate(
+    model_type="glm_moe_dsa",
+    kv_layers_name=["*kv_a_proj_with_mqa", "*kv_b_proj"],
+    q_layer_name="*q_a_proj",
+    exclude_layers_name=["lm_head"],
+)
+LLMTemplate.register_template(GLM5_template)
+print(f"[INFO]: Registered template '{GLM5_template.model_type}'")
+# --- Configuration ---
+model_dir = "zai-org/GLM-5.1"
+output_dir = "amd/GLM-5.1-MXFP4"
+quant_scheme = "mxfp4"
+exclude_layers = [
+    "*self_attn*",
+    "*mlp.gate",
+    "*lm_head",
+    "*mlp.gate_proj",
+    "*mlp.up_proj",
+    "*mlp.down_proj",
+]
+# --- Build quant config from template ---
+template = LLMTemplate.get("glm_moe_dsa")
+quant_config = template.get_config(scheme=quant_scheme, exclude_layers=exclude_layers)
+# --- File-to-file quantization (memory-efficient, no full model loading) ---
+quantizer = ModelQuantizer(quant_config)
+quantizer.direct_quantize_checkpoint(
+    pretrained_model_path=model_dir,
+    save_path=output_dir,
+)
+print(f"[INFO]: Quantization complete. Output saved to {output_dir}")
+```
+# Deployment
+### Use with vLLM
+This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend.
+## Evaluation
+The model was evaluated on GSM8K benchmarks.
+### Accuracy
+<table>
+  <tr>
+   <td><strong>Benchmark</strong>
+   </td>
+   <td><strong>GLM-5.1 </strong>
+   </td>
+   <td><strong>GLM-5.1-MXFP4(this model)</strong>
+   </td>
+   <td><strong>Recovery</strong>
+   </td>
+  </tr>
+  <tr>
+   <td>GSM8K (flexible-extract)
+   </td>
+   <td>TBD
+   </td>
+   <td>TBD
+   </td>
+   <td>TBD
+   </td>
+  </tr>
+</table>
+### Reproduction
+The GSM8K results were obtained using the `lm-evaluation-harness` framework, based on the Docker image `rocm/pytorch-private:vllm_glm5_0225`, with vLLM, lm-eval compiled and installed from source inside the image.
+The Docker image contains the necessary vLLM code modifications to support this model.
+#### Launching server
+```
+export VLLM_ROCM_USE_AITER=1
+export VLLM_ROCM_USE_AITER_FP8BMM=0
+export VLLM_ROCM_USE_AITER_FP4BMM=0
+vllm serve amd/GLM-5-MXFP4 \
+  -tp 8 \
+  --block-size 1 \
+  --trust-remote-code \
+  --max-model-len 4096
+```
+#### Evaluating model in a new terminal
+```
+lm_eval \
+  --model local-completions \
+  --model_args '{"model": "amd/GLM-5-MXFP4", "base_url": "http://localhost:8000/v1/completions", "num_concurrent": 32, "max_retries": 10, "max_gen_toks": 2048, "tokenizer_backend":"None","tokenized_requests":"False" }' \
+  --tasks gsm8k \
+  --batch_size auto \
+  --num_fewshot 5 \
+  --trust_remote_code
+```
+# License
+Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,117 @@

+[gMASK]<sop>
+{%- if tools -%}
+{%- macro tool_to_json(tool) -%}
+    {%- set ns_tool = namespace(first=true) -%}
+    {{ '{' -}}
+    {%- for k, v in tool.items() -%}
+        {%- if k != 'defer_loading' and k != 'strict' -%}
+            {%- if not ns_tool.first -%}{{- ', ' -}}{%- endif -%}
+            {%- set ns_tool.first = false -%}
+            "{{ k }}": {{ v | tojson(ensure_ascii=False) }}
+        {%- endif -%}
+    {%- endfor -%}
+    {{- '}' -}}
+{%- endmacro -%}
+<|system|>
+# Tools
+You may call one or more functions to assist with the user query.
+You are provided with function signatures within <tools></tools> XML tags:
+<tools>
+{% for tool in tools %}
+{%- if 'function' in tool -%}
+    {%- set tool = tool['function'] -%}
+{%- endif -%}
+{% if tool.defer_loading is not defined or not tool.defer_loading %}
+{{ tool_to_json(tool) }}
+{% endif %}
+{% endfor %}
+</tools>
+For each function call, output the function name and arguments within the following XML format:
+<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
+{%- macro visible_text(content) -%}
+    {%- if content is string -%}
+        {{- content }}
+    {%- elif content is iterable and content is not mapping -%}
+        {%- for item in content -%}
+            {%- if item is mapping and item.type == 'text' -%}
+                {{- item.text }}
+            {%- elif item is string -%}
+                {{- item }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- else -%}
+        {{- content }}
+    {%- endif -%}
+{%- endmacro -%}
+{%- set ns = namespace(last_user_index=-1, thinking_indices='') -%}
+{%- for m in messages %}
+    {%- if m.role == 'user' %}
+        {%- set ns.last_user_index = loop.index0 -%}
+    {%- elif m.role == 'assistant' %}
+        {%- if m.reasoning_content is string %}
+            {%- set ns.thinking_indices = ns.thinking_indices ~ ',' ~ ns.last_user_index ~ ',' -%}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- set ns.has_thinking = false -%}
+{%- for m in messages -%}
+{%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}{% set ns.has_thinking = (',' ~ loop.index0 ~ ',') in ns.thinking_indices -%}
+{%- elif m.role == 'assistant' -%}
+<|assistant|>
+{%- set content = visible_text(m.content) %}
+{%- if m.reasoning_content is string %}
+    {%- set reasoning_content = m.reasoning_content %}
+{%- elif '</think>' in content %}
+    {%- set reasoning_content = content.split('</think>')[0].split('<think>')[-1] %}
+    {%- set content = content.split('</think>')[-1] %}
+{%- elif loop.index0 > ns.last_user_index and not (enable_thinking is defined and not enable_thinking) %}
+    {%- set reasoning_content = '' %}
+{%- elif loop.index0 < ns.last_user_index and ns.has_thinking %}
+    {%- set reasoning_content = '' %}
+{%- endif %}
+{%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content is defined -%}
+{{ '<think>' + reasoning_content +  '</think>'}}
+{%- else -%}
+{{ '</think>' }}
+{%- endif -%}
+{%- if content.strip() -%}
+{{ content.strip() }}
+{%- endif -%}
+{% if m.tool_calls %}
+{% for tc in m.tool_calls %}
+{%- if tc.function %}
+    {%- set tc = tc.function %}
+{%- endif %}
+{{- '<tool_call>' + tc.name -}}
+{% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
+{% endif %}
+{%- elif m.role == 'tool' -%}
+{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+    {{- '<|observation|>' -}}
+{%- endif %}
+{%- if m.content is string -%}
+    {{- '<tool_response>' + m.content + '</tool_response>' -}}
+{%- else -%}
+    {{- '<tool_response><tools>\n' -}}
+    {% for tr in m.content %}
+        {%- for tool in tools -%}
+            {%- if 'function' in tool -%}
+                {%- set tool = tool['function'] -%}
+            {%- endif -%}
+            {%- if tool.name == tr.name -%}
+                {{- tool_to_json(tool) + '\n' -}}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- endfor -%}
+    {{- '</tools></tool_response>' -}}
+{% endif -%}
+{%- elif m.role == 'system' -%}
+<|system|>{{ visible_text(m.content) }}
+{%- endif -%}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    <|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
+{%- endif -%}

config.json ADDED Viewed

	@@ -0,0 +1,834 @@

+{
+    "architectures": [
+        "GlmMoeDsaForCausalLM"
+    ],
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "dtype": "bfloat16",
+    "eos_token_id": [
+        154820,
+        154827,
+        154829
+    ],
+    "ep_size": 1,
+    "first_k_dense_replace": 3,
+    "hidden_act": "silu",
+    "head_dim": 64,
+    "hidden_size": 6144,
+    "index_head_dim": 128,
+    "index_n_heads": 32,
+    "index_topk": 2048,
+    "indexer_rope_interleave": true,
+    "initializer_range": 0.02,
+    "intermediate_size": 12288,
+    "kv_lora_rank": 512,
+    "max_position_embeddings": 202752,
+    "moe_intermediate_size": 2048,
+    "moe_layer_freq": 1,
+    "model_type": "glm_moe_dsa",
+    "n_group": 1,
+    "n_routed_experts": 256,
+    "n_shared_experts": 1,
+    "norm_topk_prob": true,
+    "num_attention_heads": 64,
+    "num_experts_per_tok": 8,
+    "num_hidden_layers": 78,
+    "num_key_value_heads": 64,
+    "num_nextn_predict_layers": 1,
+    "pad_token_id": 154820,
+    "pretraining_tp": 1,
+    "q_lora_rank": 2048,
+    "qk_head_dim": 256,
+    "qk_nope_head_dim": 192,
+    "qk_rope_head_dim": 64,
+    "rms_norm_eps": 1e-05,
+    "rope_interleave": true,
+    "rope_parameters": {
+        "rope_theta": 1000000,
+        "rope_type": "default"
+    },
+    "routed_scaling_factor": 2.5,
+    "scoring_func": "sigmoid",
+    "tie_word_embeddings": false,
+    "topk_group": 1,
+    "topk_method": "noaux_tc",
+    "transformers_version": "5.4.0",
+    "use_cache": true,
+    "v_head_dim": 256,
+    "vocab_size": 154880,
+    "quantization_config": {
+        "global_quant_config": {
+            "input_tensors": {
+                "dtype": "fp4",
+                "is_dynamic": true,
+                "qscheme": "per_group",
+                "ch_axis": -1,
+                "group_size": 32,
+                "block_size": null,
+                "symmetric": null,
+                "round_method": "half_even",
+                "scale_type": "float",
+                "scale_format": "e8m0",
+                "scale_calculation_mode": "even",
+                "mx_element_dtype": null,
+                "observer_cls": "PerBlockMXObserver",
+                "is_scale_quant": false
+            },
+            "output_tensors": null,
+            "weight": {
+                "dtype": "fp4",
+                "is_dynamic": false,
+                "qscheme": "per_group",
+                "ch_axis": -1,
+                "group_size": 32,
+                "block_size": null,
+                "symmetric": null,
+                "round_method": "half_even",
+                "scale_type": "float",
+                "scale_format": "e8m0",
+                "scale_calculation_mode": "even",
+                "mx_element_dtype": null,
+                "observer_cls": "PerBlockMXObserver",
+                "is_scale_quant": false
+            },
+            "bias": null,
+            "target_device": null
+        },
+        "exclude": [
+            "lm_head",
+            "model.layers.0.mlp.down_proj",
+            "model.layers.0.mlp.gate_proj",
+            "model.layers.0.mlp.up_proj",
+            "model.layers.0.self_attn.indexer.weights_proj",
+            "model.layers.0.self_attn.indexer.wk",
+            "model.layers.0.self_attn.indexer.wq_b",
+            "model.layers.0.self_attn.kv_a_proj_with_mqa",
+            "model.layers.0.self_attn.kv_b_proj",
+            "model.layers.0.self_attn.o_proj",
+            "model.layers.0.self_attn.q_a_proj",
+            "model.layers.0.self_attn.q_b_proj",
+            "model.layers.1.mlp.down_proj",
+            "model.layers.1.mlp.gate_proj",
+            "model.layers.1.mlp.up_proj",
+            "model.layers.1.self_attn.indexer.weights_proj",
+            "model.layers.1.self_attn.indexer.wk",
+            "model.layers.1.self_attn.indexer.wq_b",
+            "model.layers.1.self_attn.kv_a_proj_with_mqa",
+            "model.layers.1.self_attn.kv_b_proj",
+            "model.layers.1.self_attn.o_proj",
+            "model.layers.1.self_attn.q_a_proj",
+            "model.layers.1.self_attn.q_b_proj",
+            "model.layers.10.mlp.gate",
+            "model.layers.10.self_attn.indexer.weights_proj",
+            "model.layers.10.self_attn.indexer.wk",
+            "model.layers.10.self_attn.indexer.wq_b",
+            "model.layers.10.self_attn.kv_a_proj_with_mqa",
+            "model.layers.10.self_attn.kv_b_proj",
+            "model.layers.10.self_attn.o_proj",
+            "model.layers.10.self_attn.q_a_proj",
+            "model.layers.10.self_attn.q_b_proj",
+            "model.layers.11.mlp.gate",
+            "model.layers.11.self_attn.indexer.weights_proj",
+            "model.layers.11.self_attn.indexer.wk",
+            "model.layers.11.self_attn.indexer.wq_b",
+            "model.layers.11.self_attn.kv_a_proj_with_mqa",
+            "model.layers.11.self_attn.kv_b_proj",
+            "model.layers.11.self_attn.o_proj",
+            "model.layers.11.self_attn.q_a_proj",
+            "model.layers.11.self_attn.q_b_proj",
+            "model.layers.12.mlp.gate",
+            "model.layers.12.self_attn.indexer.weights_proj",
+            "model.layers.12.self_attn.indexer.wk",
+            "model.layers.12.self_attn.indexer.wq_b",
+            "model.layers.12.self_attn.kv_a_proj_with_mqa",
+            "model.layers.12.self_attn.kv_b_proj",
+            "model.layers.12.self_attn.o_proj",
+            "model.layers.12.self_attn.q_a_proj",
+            "model.layers.12.self_attn.q_b_proj",
+            "model.layers.13.mlp.gate",
+            "model.layers.13.self_attn.indexer.weights_proj",
+            "model.layers.13.self_attn.indexer.wk",
+            "model.layers.13.self_attn.indexer.wq_b",
+            "model.layers.13.self_attn.kv_a_proj_with_mqa",
+            "model.layers.13.self_attn.kv_b_proj",
+            "model.layers.13.self_attn.o_proj",
+            "model.layers.13.self_attn.q_a_proj",
+            "model.layers.13.self_attn.q_b_proj",
+            "model.layers.14.mlp.gate",
+            "model.layers.14.self_attn.indexer.weights_proj",
+            "model.layers.14.self_attn.indexer.wk",
+            "model.layers.14.self_attn.indexer.wq_b",
+            "model.layers.14.self_attn.kv_a_proj_with_mqa",
+            "model.layers.14.self_attn.kv_b_proj",
+            "model.layers.14.self_attn.o_proj",
+            "model.layers.14.self_attn.q_a_proj",
+            "model.layers.14.self_attn.q_b_proj",
+            "model.layers.15.mlp.gate",
+            "model.layers.15.self_attn.indexer.weights_proj",
+            "model.layers.15.self_attn.indexer.wk",
+            "model.layers.15.self_attn.indexer.wq_b",
+            "model.layers.15.self_attn.kv_a_proj_with_mqa",
+            "model.layers.15.self_attn.kv_b_proj",
+            "model.layers.15.self_attn.o_proj",
+            "model.layers.15.self_attn.q_a_proj",
+            "model.layers.15.self_attn.q_b_proj",
+            "model.layers.16.mlp.gate",
+            "model.layers.16.self_attn.indexer.weights_proj",
+            "model.layers.16.self_attn.indexer.wk",
+            "model.layers.16.self_attn.indexer.wq_b",
+            "model.layers.16.self_attn.kv_a_proj_with_mqa",
+            "model.layers.16.self_attn.kv_b_proj",
+            "model.layers.16.self_attn.o_proj",
+            "model.layers.16.self_attn.q_a_proj",
+            "model.layers.16.self_attn.q_b_proj",
+            "model.layers.17.mlp.gate",
+            "model.layers.17.self_attn.indexer.weights_proj",
+            "model.layers.17.self_attn.indexer.wk",
+            "model.layers.17.self_attn.indexer.wq_b",
+            "model.layers.17.self_attn.kv_a_proj_with_mqa",
+            "model.layers.17.self_attn.kv_b_proj",
+            "model.layers.17.self_attn.o_proj",
+            "model.layers.17.self_attn.q_a_proj",
+            "model.layers.17.self_attn.q_b_proj",
+            "model.layers.18.mlp.gate",
+            "model.layers.18.self_attn.indexer.weights_proj",
+            "model.layers.18.self_attn.indexer.wk",
+            "model.layers.18.self_attn.indexer.wq_b",
+            "model.layers.18.self_attn.kv_a_proj_with_mqa",
+            "model.layers.18.self_attn.kv_b_proj",
+            "model.layers.18.self_attn.o_proj",
+            "model.layers.18.self_attn.q_a_proj",
+            "model.layers.18.self_attn.q_b_proj",
+            "model.layers.19.mlp.gate",
+            "model.layers.19.self_attn.indexer.weights_proj",
+            "model.layers.19.self_attn.indexer.wk",
+            "model.layers.19.self_attn.indexer.wq_b",
+            "model.layers.19.self_attn.kv_a_proj_with_mqa",
+            "model.layers.19.self_attn.kv_b_proj",
+            "model.layers.19.self_attn.o_proj",
+            "model.layers.19.self_attn.q_a_proj",
+            "model.layers.19.self_attn.q_b_proj",
+            "model.layers.2.mlp.down_proj",
+            "model.layers.2.mlp.gate_proj",
+            "model.layers.2.mlp.up_proj",
+            "model.layers.2.self_attn.indexer.weights_proj",
+            "model.layers.2.self_attn.indexer.wk",
+            "model.layers.2.self_attn.indexer.wq_b",
+            "model.layers.2.self_attn.kv_a_proj_with_mqa",
+            "model.layers.2.self_attn.kv_b_proj",
+            "model.layers.2.self_attn.o_proj",
+            "model.layers.2.self_attn.q_a_proj",
+            "model.layers.2.self_attn.q_b_proj",
+            "model.layers.20.mlp.gate",
+            "model.layers.20.self_attn.indexer.weights_proj",
+            "model.layers.20.self_attn.indexer.wk",
+            "model.layers.20.self_attn.indexer.wq_b",
+            "model.layers.20.self_attn.kv_a_proj_with_mqa",
+            "model.layers.20.self_attn.kv_b_proj",
+            "model.layers.20.self_attn.o_proj",
+            "model.layers.20.self_attn.q_a_proj",
+            "model.layers.20.self_attn.q_b_proj",
+            "model.layers.21.mlp.gate",
+            "model.layers.21.self_attn.indexer.weights_proj",
+            "model.layers.21.self_attn.indexer.wk",
+            "model.layers.21.self_attn.indexer.wq_b",
+            "model.layers.21.self_attn.kv_a_proj_with_mqa",
+            "model.layers.21.self_attn.kv_b_proj",
+            "model.layers.21.self_attn.o_proj",
+            "model.layers.21.self_attn.q_a_proj",
+            "model.layers.21.self_attn.q_b_proj",
+            "model.layers.22.mlp.gate",
+            "model.layers.22.self_attn.indexer.weights_proj",
+            "model.layers.22.self_attn.indexer.wk",
+            "model.layers.22.self_attn.indexer.wq_b",
+            "model.layers.22.self_attn.kv_a_proj_with_mqa",
+            "model.layers.22.self_attn.kv_b_proj",
+            "model.layers.22.self_attn.o_proj",
+            "model.layers.22.self_attn.q_a_proj",
+            "model.layers.22.self_attn.q_b_proj",
+            "model.layers.23.mlp.gate",
+            "model.layers.23.self_attn.indexer.weights_proj",
+            "model.layers.23.self_attn.indexer.wk",
+            "model.layers.23.self_attn.indexer.wq_b",
+            "model.layers.23.self_attn.kv_a_proj_with_mqa",
+            "model.layers.23.self_attn.kv_b_proj",
+            "model.layers.23.self_attn.o_proj",
+            "model.layers.23.self_attn.q_a_proj",
+            "model.layers.23.self_attn.q_b_proj",
+            "model.layers.24.mlp.gate",
+            "model.layers.24.self_attn.indexer.weights_proj",
+            "model.layers.24.self_attn.indexer.wk",
+            "model.layers.24.self_attn.indexer.wq_b",
+            "model.layers.24.self_attn.kv_a_proj_with_mqa",
+            "model.layers.24.self_attn.kv_b_proj",
+            "model.layers.24.self_attn.o_proj",
+            "model.layers.24.self_attn.q_a_proj",
+            "model.layers.24.self_attn.q_b_proj",
+            "model.layers.25.mlp.gate",
+            "model.layers.25.self_attn.indexer.weights_proj",
+            "model.layers.25.self_attn.indexer.wk",
+            "model.layers.25.self_attn.indexer.wq_b",
+            "model.layers.25.self_attn.kv_a_proj_with_mqa",
+            "model.layers.25.self_attn.kv_b_proj",
+            "model.layers.25.self_attn.o_proj",
+            "model.layers.25.self_attn.q_a_proj",
+            "model.layers.25.self_attn.q_b_proj",
+            "model.layers.26.mlp.gate",
+            "model.layers.26.self_attn.indexer.weights_proj",
+            "model.layers.26.self_attn.indexer.wk",
+            "model.layers.26.self_attn.indexer.wq_b",
+            "model.layers.26.self_attn.kv_a_proj_with_mqa",
+            "model.layers.26.self_attn.kv_b_proj",
+            "model.layers.26.self_attn.o_proj",
+            "model.layers.26.self_attn.q_a_proj",
+            "model.layers.26.self_attn.q_b_proj",
+            "model.layers.27.mlp.gate",
+            "model.layers.27.self_attn.indexer.weights_proj",
+            "model.layers.27.self_attn.indexer.wk",
+            "model.layers.27.self_attn.indexer.wq_b",
+            "model.layers.27.self_attn.kv_a_proj_with_mqa",
+            "model.layers.27.self_attn.kv_b_proj",
+            "model.layers.27.self_attn.o_proj",
+            "model.layers.27.self_attn.q_a_proj",
+            "model.layers.27.self_attn.q_b_proj",
+            "model.layers.28.mlp.gate",
+            "model.layers.28.self_attn.indexer.weights_proj",
+            "model.layers.28.self_attn.indexer.wk",
+            "model.layers.28.self_attn.indexer.wq_b",
+            "model.layers.28.self_attn.kv_a_proj_with_mqa",
+            "model.layers.28.self_attn.kv_b_proj",
+            "model.layers.28.self_attn.o_proj",
+            "model.layers.28.self_attn.q_a_proj",
+            "model.layers.28.self_attn.q_b_proj",
+            "model.layers.29.mlp.gate",
+            "model.layers.29.self_attn.indexer.weights_proj",
+            "model.layers.29.self_attn.indexer.wk",
+            "model.layers.29.self_attn.indexer.wq_b",
+            "model.layers.29.self_attn.kv_a_proj_with_mqa",
+            "model.layers.29.self_attn.kv_b_proj",
+            "model.layers.29.self_attn.o_proj",
+            "model.layers.29.self_attn.q_a_proj",
+            "model.layers.29.self_attn.q_b_proj",
+            "model.layers.3.mlp.gate",
+            "model.layers.3.self_attn.indexer.weights_proj",
+            "model.layers.3.self_attn.indexer.wk",
+            "model.layers.3.self_attn.indexer.wq_b",
+            "model.layers.3.self_attn.kv_a_proj_with_mqa",
+            "model.layers.3.self_attn.kv_b_proj",
+            "model.layers.3.self_attn.o_proj",
+            "model.layers.3.self_attn.q_a_proj",
+            "model.layers.3.self_attn.q_b_proj",
+            "model.layers.30.mlp.gate",
+            "model.layers.30.self_attn.indexer.weights_proj",
+            "model.layers.30.self_attn.indexer.wk",
+            "model.layers.30.self_attn.indexer.wq_b",
+            "model.layers.30.self_attn.kv_a_proj_with_mqa",
+            "model.layers.30.self_attn.kv_b_proj",
+            "model.layers.30.self_attn.o_proj",
+            "model.layers.30.self_attn.q_a_proj",
+            "model.layers.30.self_attn.q_b_proj",
+            "model.layers.31.mlp.gate",
+            "model.layers.31.self_attn.indexer.weights_proj",
+            "model.layers.31.self_attn.indexer.wk",
+            "model.layers.31.self_attn.indexer.wq_b",
+            "model.layers.31.self_attn.kv_a_proj_with_mqa",
+            "model.layers.31.self_attn.kv_b_proj",
+            "model.layers.31.self_attn.o_proj",
+            "model.layers.31.self_attn.q_a_proj",
+            "model.layers.31.self_attn.q_b_proj",
+            "model.layers.32.mlp.gate",
+            "model.layers.32.self_attn.indexer.weights_proj",
+            "model.layers.32.self_attn.indexer.wk",
+            "model.layers.32.self_attn.indexer.wq_b",
+            "model.layers.32.self_attn.kv_a_proj_with_mqa",
+            "model.layers.32.self_attn.kv_b_proj",
+            "model.layers.32.self_attn.o_proj",
+            "model.layers.32.self_attn.q_a_proj",
+            "model.layers.32.self_attn.q_b_proj",
+            "model.layers.33.mlp.gate",
+            "model.layers.33.self_attn.indexer.weights_proj",
+            "model.layers.33.self_attn.indexer.wk",
+            "model.layers.33.self_attn.indexer.wq_b",
+            "model.layers.33.self_attn.kv_a_proj_with_mqa",
+            "model.layers.33.self_attn.kv_b_proj",
+            "model.layers.33.self_attn.o_proj",
+            "model.layers.33.self_attn.q_a_proj",
+            "model.layers.33.self_attn.q_b_proj",
+            "model.layers.34.mlp.gate",
+            "model.layers.34.self_attn.indexer.weights_proj",
+            "model.layers.34.self_attn.indexer.wk",
+            "model.layers.34.self_attn.indexer.wq_b",
+            "model.layers.34.self_attn.kv_a_proj_with_mqa",
+            "model.layers.34.self_attn.kv_b_proj",
+            "model.layers.34.self_attn.o_proj",
+            "model.layers.34.self_attn.q_a_proj",
+            "model.layers.34.self_attn.q_b_proj",
+            "model.layers.35.mlp.gate",
+            "model.layers.35.self_attn.indexer.weights_proj",
+            "model.layers.35.self_attn.indexer.wk",
+            "model.layers.35.self_attn.indexer.wq_b",
+            "model.layers.35.self_attn.kv_a_proj_with_mqa",
+            "model.layers.35.self_attn.kv_b_proj",
+            "model.layers.35.self_attn.o_proj",
+            "model.layers.35.self_attn.q_a_proj",
+            "model.layers.35.self_attn.q_b_proj",
+            "model.layers.36.mlp.gate",
+            "model.layers.36.self_attn.indexer.weights_proj",
+            "model.layers.36.self_attn.indexer.wk",
+            "model.layers.36.self_attn.indexer.wq_b",
+            "model.layers.36.self_attn.kv_a_proj_with_mqa",
+            "model.layers.36.self_attn.kv_b_proj",
+            "model.layers.36.self_attn.o_proj",
+            "model.layers.36.self_attn.q_a_proj",
+            "model.layers.36.self_attn.q_b_proj",
+            "model.layers.37.mlp.gate",
+            "model.layers.37.self_attn.indexer.weights_proj",
+            "model.layers.37.self_attn.indexer.wk",
+            "model.layers.37.self_attn.indexer.wq_b",
+            "model.layers.37.self_attn.kv_a_proj_with_mqa",
+            "model.layers.37.self_attn.kv_b_proj",
+            "model.layers.37.self_attn.o_proj",
+            "model.layers.37.self_attn.q_a_proj",
+            "model.layers.37.self_attn.q_b_proj",
+            "model.layers.38.mlp.gate",
+            "model.layers.38.self_attn.indexer.weights_proj",
+            "model.layers.38.self_attn.indexer.wk",
+            "model.layers.38.self_attn.indexer.wq_b",
+            "model.layers.38.self_attn.kv_a_proj_with_mqa",
+            "model.layers.38.self_attn.kv_b_proj",
+            "model.layers.38.self_attn.o_proj",
+            "model.layers.38.self_attn.q_a_proj",
+            "model.layers.38.self_attn.q_b_proj",
+            "model.layers.39.mlp.gate",
+            "model.layers.39.self_attn.indexer.weights_proj",
+            "model.layers.39.self_attn.indexer.wk",
+            "model.layers.39.self_attn.indexer.wq_b",
+            "model.layers.39.self_attn.kv_a_proj_with_mqa",
+            "model.layers.39.self_attn.kv_b_proj",
+            "model.layers.39.self_attn.o_proj",
+            "model.layers.39.self_attn.q_a_proj",
+            "model.layers.39.self_attn.q_b_proj",
+            "model.layers.4.mlp.gate",
+            "model.layers.4.self_attn.indexer.weights_proj",
+            "model.layers.4.self_attn.indexer.wk",
+            "model.layers.4.self_attn.indexer.wq_b",
+            "model.layers.4.self_attn.kv_a_proj_with_mqa",
+            "model.layers.4.self_attn.kv_b_proj",
+            "model.layers.4.self_attn.o_proj",
+            "model.layers.4.self_attn.q_a_proj",
+            "model.layers.4.self_attn.q_b_proj",
+            "model.layers.40.mlp.gate",
+            "model.layers.40.self_attn.indexer.weights_proj",
+            "model.layers.40.self_attn.indexer.wk",
+            "model.layers.40.self_attn.indexer.wq_b",
+            "model.layers.40.self_attn.kv_a_proj_with_mqa",
+            "model.layers.40.self_attn.kv_b_proj",
+            "model.layers.40.self_attn.o_proj",
+            "model.layers.40.self_attn.q_a_proj",
+            "model.layers.40.self_attn.q_b_proj",
+            "model.layers.41.mlp.gate",
+            "model.layers.41.self_attn.indexer.weights_proj",
+            "model.layers.41.self_attn.indexer.wk",
+            "model.layers.41.self_attn.indexer.wq_b",
+            "model.layers.41.self_attn.kv_a_proj_with_mqa",
+            "model.layers.41.self_attn.kv_b_proj",
+            "model.layers.41.self_attn.o_proj",
+            "model.layers.41.self_attn.q_a_proj",
+            "model.layers.41.self_attn.q_b_proj",
+            "model.layers.42.mlp.gate",
+            "model.layers.42.self_attn.indexer.weights_proj",
+            "model.layers.42.self_attn.indexer.wk",
+            "model.layers.42.self_attn.indexer.wq_b",
+            "model.layers.42.self_attn.kv_a_proj_with_mqa",
+            "model.layers.42.self_attn.kv_b_proj",
+            "model.layers.42.self_attn.o_proj",
+            "model.layers.42.self_attn.q_a_proj",
+            "model.layers.42.self_attn.q_b_proj",
+            "model.layers.43.mlp.gate",
+            "model.layers.43.self_attn.indexer.weights_proj",
+            "model.layers.43.self_attn.indexer.wk",
+            "model.layers.43.self_attn.indexer.wq_b",
+            "model.layers.43.self_attn.kv_a_proj_with_mqa",
+            "model.layers.43.self_attn.kv_b_proj",
+            "model.layers.43.self_attn.o_proj",
+            "model.layers.43.self_attn.q_a_proj",
+            "model.layers.43.self_attn.q_b_proj",
+            "model.layers.44.mlp.gate",
+            "model.layers.44.self_attn.indexer.weights_proj",
+            "model.layers.44.self_attn.indexer.wk",
+            "model.layers.44.self_attn.indexer.wq_b",
+            "model.layers.44.self_attn.kv_a_proj_with_mqa",
+            "model.layers.44.self_attn.kv_b_proj",
+            "model.layers.44.self_attn.o_proj",
+            "model.layers.44.self_attn.q_a_proj",
+            "model.layers.44.self_attn.q_b_proj",
+            "model.layers.45.mlp.gate",
+            "model.layers.45.self_attn.indexer.weights_proj",
+            "model.layers.45.self_attn.indexer.wk",
+            "model.layers.45.self_attn.indexer.wq_b",
+            "model.layers.45.self_attn.kv_a_proj_with_mqa",
+            "model.layers.45.self_attn.kv_b_proj",
+            "model.layers.45.self_attn.o_proj",
+            "model.layers.45.self_attn.q_a_proj",
+            "model.layers.45.self_attn.q_b_proj",
+            "model.layers.46.mlp.gate",
+            "model.layers.46.self_attn.indexer.weights_proj",
+            "model.layers.46.self_attn.indexer.wk",
+            "model.layers.46.self_attn.indexer.wq_b",
+            "model.layers.46.self_attn.kv_a_proj_with_mqa",
+            "model.layers.46.self_attn.kv_b_proj",
+            "model.layers.46.self_attn.o_proj",
+            "model.layers.46.self_attn.q_a_proj",
+            "model.layers.46.self_attn.q_b_proj",
+            "model.layers.47.mlp.gate",
+            "model.layers.47.self_attn.indexer.weights_proj",
+            "model.layers.47.self_attn.indexer.wk",
+            "model.layers.47.self_attn.indexer.wq_b",
+            "model.layers.47.self_attn.kv_a_proj_with_mqa",
+            "model.layers.47.self_attn.kv_b_proj",
+            "model.layers.47.self_attn.o_proj",
+            "model.layers.47.self_attn.q_a_proj",
+            "model.layers.47.self_attn.q_b_proj",
+            "model.layers.48.mlp.gate",
+            "model.layers.48.self_attn.indexer.weights_proj",
+            "model.layers.48.self_attn.indexer.wk",
+            "model.layers.48.self_attn.indexer.wq_b",
+            "model.layers.48.self_attn.kv_a_proj_with_mqa",
+            "model.layers.48.self_attn.kv_b_proj",
+            "model.layers.48.self_attn.o_proj",
+            "model.layers.48.self_attn.q_a_proj",
+            "model.layers.48.self_attn.q_b_proj",
+            "model.layers.49.mlp.gate",
+            "model.layers.49.self_attn.indexer.weights_proj",
+            "model.layers.49.self_attn.indexer.wk",
+            "model.layers.49.self_attn.indexer.wq_b",
+            "model.layers.49.self_attn.kv_a_proj_with_mqa",
+            "model.layers.49.self_attn.kv_b_proj",
+            "model.layers.49.self_attn.o_proj",
+            "model.layers.49.self_attn.q_a_proj",
+            "model.layers.49.self_attn.q_b_proj",
+            "model.layers.5.mlp.gate",
+            "model.layers.5.self_attn.indexer.weights_proj",
+            "model.layers.5.self_attn.indexer.wk",
+            "model.layers.5.self_attn.indexer.wq_b",
+            "model.layers.5.self_attn.kv_a_proj_with_mqa",
+            "model.layers.5.self_attn.kv_b_proj",
+            "model.layers.5.self_attn.o_proj",
+            "model.layers.5.self_attn.q_a_proj",
+            "model.layers.5.self_attn.q_b_proj",
+            "model.layers.50.mlp.gate",
+            "model.layers.50.self_attn.indexer.weights_proj",
+            "model.layers.50.self_attn.indexer.wk",
+            "model.layers.50.self_attn.indexer.wq_b",
+            "model.layers.50.self_attn.kv_a_proj_with_mqa",
+            "model.layers.50.self_attn.kv_b_proj",
+            "model.layers.50.self_attn.o_proj",
+            "model.layers.50.self_attn.q_a_proj",
+            "model.layers.50.self_attn.q_b_proj",
+            "model.layers.51.mlp.gate",
+            "model.layers.51.self_attn.indexer.weights_proj",
+            "model.layers.51.self_attn.indexer.wk",
+            "model.layers.51.self_attn.indexer.wq_b",
+            "model.layers.51.self_attn.kv_a_proj_with_mqa",
+            "model.layers.51.self_attn.kv_b_proj",
+            "model.layers.51.self_attn.o_proj",
+            "model.layers.51.self_attn.q_a_proj",
+            "model.layers.51.self_attn.q_b_proj",
+            "model.layers.52.mlp.gate",
+            "model.layers.52.self_attn.indexer.weights_proj",
+            "model.layers.52.self_attn.indexer.wk",
+            "model.layers.52.self_attn.indexer.wq_b",
+            "model.layers.52.self_attn.kv_a_proj_with_mqa",
+            "model.layers.52.self_attn.kv_b_proj",
+            "model.layers.52.self_attn.o_proj",
+            "model.layers.52.self_attn.q_a_proj",
+            "model.layers.52.self_attn.q_b_proj",
+            "model.layers.53.mlp.gate",
+            "model.layers.53.self_attn.indexer.weights_proj",
+            "model.layers.53.self_attn.indexer.wk",
+            "model.layers.53.self_attn.indexer.wq_b",
+            "model.layers.53.self_attn.kv_a_proj_with_mqa",
+            "model.layers.53.self_attn.kv_b_proj",
+            "model.layers.53.self_attn.o_proj",
+            "model.layers.53.self_attn.q_a_proj",
+            "model.layers.53.self_attn.q_b_proj",
+            "model.layers.54.mlp.gate",
+            "model.layers.54.self_attn.indexer.weights_proj",
+            "model.layers.54.self_attn.indexer.wk",
+            "model.layers.54.self_attn.indexer.wq_b",
+            "model.layers.54.self_attn.kv_a_proj_with_mqa",
+            "model.layers.54.self_attn.kv_b_proj",
+            "model.layers.54.self_attn.o_proj",
+            "model.layers.54.self_attn.q_a_proj",
+            "model.layers.54.self_attn.q_b_proj",
+            "model.layers.55.mlp.gate",
+            "model.layers.55.self_attn.indexer.weights_proj",
+            "model.layers.55.self_attn.indexer.wk",
+            "model.layers.55.self_attn.indexer.wq_b",
+            "model.layers.55.self_attn.kv_a_proj_with_mqa",
+            "model.layers.55.self_attn.kv_b_proj",
+            "model.layers.55.self_attn.o_proj",
+            "model.layers.55.self_attn.q_a_proj",
+            "model.layers.55.self_attn.q_b_proj",
+            "model.layers.56.mlp.gate",
+            "model.layers.56.self_attn.indexer.weights_proj",
+            "model.layers.56.self_attn.indexer.wk",
+            "model.layers.56.self_attn.indexer.wq_b",
+            "model.layers.56.self_attn.kv_a_proj_with_mqa",
+            "model.layers.56.self_attn.kv_b_proj",
+            "model.layers.56.self_attn.o_proj",
+            "model.layers.56.self_attn.q_a_proj",
+            "model.layers.56.self_attn.q_b_proj",
+            "model.layers.57.mlp.gate",
+            "model.layers.57.self_attn.indexer.weights_proj",
+            "model.layers.57.self_attn.indexer.wk",
+            "model.layers.57.self_attn.indexer.wq_b",
+            "model.layers.57.self_attn.kv_a_proj_with_mqa",
+            "model.layers.57.self_attn.kv_b_proj",
+            "model.layers.57.self_attn.o_proj",
+            "model.layers.57.self_attn.q_a_proj",
+            "model.layers.57.self_attn.q_b_proj",
+            "model.layers.58.mlp.gate",
+            "model.layers.58.self_attn.indexer.weights_proj",
+            "model.layers.58.self_attn.indexer.wk",
+            "model.layers.58.self_attn.indexer.wq_b",
+            "model.layers.58.self_attn.kv_a_proj_with_mqa",
+            "model.layers.58.self_attn.kv_b_proj",
+            "model.layers.58.self_attn.o_proj",
+            "model.layers.58.self_attn.q_a_proj",
+            "model.layers.58.self_attn.q_b_proj",
+            "model.layers.59.mlp.gate",
+            "model.layers.59.self_attn.indexer.weights_proj",
+            "model.layers.59.self_attn.indexer.wk",
+            "model.layers.59.self_attn.indexer.wq_b",
+            "model.layers.59.self_attn.kv_a_proj_with_mqa",
+            "model.layers.59.self_attn.kv_b_proj",
+            "model.layers.59.self_attn.o_proj",
+            "model.layers.59.self_attn.q_a_proj",
+            "model.layers.59.self_attn.q_b_proj",
+            "model.layers.6.mlp.gate",
+            "model.layers.6.self_attn.indexer.weights_proj",
+            "model.layers.6.self_attn.indexer.wk",
+            "model.layers.6.self_attn.indexer.wq_b",
+            "model.layers.6.self_attn.kv_a_proj_with_mqa",
+            "model.layers.6.self_attn.kv_b_proj",
+            "model.layers.6.self_attn.o_proj",
+            "model.layers.6.self_attn.q_a_proj",
+            "model.layers.6.self_attn.q_b_proj",
+            "model.layers.60.mlp.gate",
+            "model.layers.60.self_attn.indexer.weights_proj",
+            "model.layers.60.self_attn.indexer.wk",
+            "model.layers.60.self_attn.indexer.wq_b",
+            "model.layers.60.self_attn.kv_a_proj_with_mqa",
+            "model.layers.60.self_attn.kv_b_proj",
+            "model.layers.60.self_attn.o_proj",
+            "model.layers.60.self_attn.q_a_proj",
+            "model.layers.60.self_attn.q_b_proj",
+            "model.layers.61.mlp.gate",
+            "model.layers.61.self_attn.indexer.weights_proj",
+            "model.layers.61.self_attn.indexer.wk",
+            "model.layers.61.self_attn.indexer.wq_b",
+            "model.layers.61.self_attn.kv_a_proj_with_mqa",
+            "model.layers.61.self_attn.kv_b_proj",
+            "model.layers.61.self_attn.o_proj",
+            "model.layers.61.self_attn.q_a_proj",
+            "model.layers.61.self_attn.q_b_proj",
+            "model.layers.62.mlp.gate",
+            "model.layers.62.self_attn.indexer.weights_proj",
+            "model.layers.62.self_attn.indexer.wk",
+            "model.layers.62.self_attn.indexer.wq_b",
+            "model.layers.62.self_attn.kv_a_proj_with_mqa",
+            "model.layers.62.self_attn.kv_b_proj",
+            "model.layers.62.self_attn.o_proj",
+            "model.layers.62.self_attn.q_a_proj",
+            "model.layers.62.self_attn.q_b_proj",
+            "model.layers.63.mlp.gate",
+            "model.layers.63.self_attn.indexer.weights_proj",
+            "model.layers.63.self_attn.indexer.wk",
+            "model.layers.63.self_attn.indexer.wq_b",
+            "model.layers.63.self_attn.kv_a_proj_with_mqa",
+            "model.layers.63.self_attn.kv_b_proj",
+            "model.layers.63.self_attn.o_proj",
+            "model.layers.63.self_attn.q_a_proj",
+            "model.layers.63.self_attn.q_b_proj",
+            "model.layers.64.mlp.gate",
+            "model.layers.64.self_attn.indexer.weights_proj",
+            "model.layers.64.self_attn.indexer.wk",
+            "model.layers.64.self_attn.indexer.wq_b",
+            "model.layers.64.self_attn.kv_a_proj_with_mqa",
+            "model.layers.64.self_attn.kv_b_proj",
+            "model.layers.64.self_attn.o_proj",
+            "model.layers.64.self_attn.q_a_proj",
+            "model.layers.64.self_attn.q_b_proj",
+            "model.layers.65.mlp.gate",
+            "model.layers.65.self_attn.indexer.weights_proj",
+            "model.layers.65.self_attn.indexer.wk",
+            "model.layers.65.self_attn.indexer.wq_b",
+            "model.layers.65.self_attn.kv_a_proj_with_mqa",
+            "model.layers.65.self_attn.kv_b_proj",
+            "model.layers.65.self_attn.o_proj",
+            "model.layers.65.self_attn.q_a_proj",
+            "model.layers.65.self_attn.q_b_proj",
+            "model.layers.66.mlp.gate",
+            "model.layers.66.self_attn.indexer.weights_proj",
+            "model.layers.66.self_attn.indexer.wk",
+            "model.layers.66.self_attn.indexer.wq_b",
+            "model.layers.66.self_attn.kv_a_proj_with_mqa",
+            "model.layers.66.self_attn.kv_b_proj",
+            "model.layers.66.self_attn.o_proj",
+            "model.layers.66.self_attn.q_a_proj",
+            "model.layers.66.self_attn.q_b_proj",
+            "model.layers.67.mlp.gate",
+            "model.layers.67.self_attn.indexer.weights_proj",
+            "model.layers.67.self_attn.indexer.wk",
+            "model.layers.67.self_attn.indexer.wq_b",
+            "model.layers.67.self_attn.kv_a_proj_with_mqa",
+            "model.layers.67.self_attn.kv_b_proj",
+            "model.layers.67.self_attn.o_proj",
+            "model.layers.67.self_attn.q_a_proj",
+            "model.layers.67.self_attn.q_b_proj",
+            "model.layers.68.mlp.gate",
+            "model.layers.68.self_attn.indexer.weights_proj",
+            "model.layers.68.self_attn.indexer.wk",
+            "model.layers.68.self_attn.indexer.wq_b",
+            "model.layers.68.self_attn.kv_a_proj_with_mqa",
+            "model.layers.68.self_attn.kv_b_proj",
+            "model.layers.68.self_attn.o_proj",
+            "model.layers.68.self_attn.q_a_proj",
+            "model.layers.68.self_attn.q_b_proj",
+            "model.layers.69.mlp.gate",
+            "model.layers.69.self_attn.indexer.weights_proj",
+            "model.layers.69.self_attn.indexer.wk",
+            "model.layers.69.self_attn.indexer.wq_b",
+            "model.layers.69.self_attn.kv_a_proj_with_mqa",
+            "model.layers.69.self_attn.kv_b_proj",
+            "model.layers.69.self_attn.o_proj",
+            "model.layers.69.self_attn.q_a_proj",
+            "model.layers.69.self_attn.q_b_proj",
+            "model.layers.7.mlp.gate",
+            "model.layers.7.self_attn.indexer.weights_proj",
+            "model.layers.7.self_attn.indexer.wk",
+            "model.layers.7.self_attn.indexer.wq_b",
+            "model.layers.7.self_attn.kv_a_proj_with_mqa",
+            "model.layers.7.self_attn.kv_b_proj",
+            "model.layers.7.self_attn.o_proj",
+            "model.layers.7.self_attn.q_a_proj",
+            "model.layers.7.self_attn.q_b_proj",
+            "model.layers.70.mlp.gate",
+            "model.layers.70.self_attn.indexer.weights_proj",
+            "model.layers.70.self_attn.indexer.wk",
+            "model.layers.70.self_attn.indexer.wq_b",
+            "model.layers.70.self_attn.kv_a_proj_with_mqa",
+            "model.layers.70.self_attn.kv_b_proj",
+            "model.layers.70.self_attn.o_proj",
+            "model.layers.70.self_attn.q_a_proj",
+            "model.layers.70.self_attn.q_b_proj",
+            "model.layers.71.mlp.gate",
+            "model.layers.71.self_attn.indexer.weights_proj",
+            "model.layers.71.self_attn.indexer.wk",
+            "model.layers.71.self_attn.indexer.wq_b",
+            "model.layers.71.self_attn.kv_a_proj_with_mqa",
+            "model.layers.71.self_attn.kv_b_proj",
+            "model.layers.71.self_attn.o_proj",
+            "model.layers.71.self_attn.q_a_proj",
+            "model.layers.71.self_attn.q_b_proj",
+            "model.layers.72.mlp.gate",
+            "model.layers.72.self_attn.indexer.weights_proj",
+            "model.layers.72.self_attn.indexer.wk",
+            "model.layers.72.self_attn.indexer.wq_b",
+            "model.layers.72.self_attn.kv_a_proj_with_mqa",
+            "model.layers.72.self_attn.kv_b_proj",
+            "model.layers.72.self_attn.o_proj",
+            "model.layers.72.self_attn.q_a_proj",
+            "model.layers.72.self_attn.q_b_proj",
+            "model.layers.73.mlp.gate",
+            "model.layers.73.self_attn.indexer.weights_proj",
+            "model.layers.73.self_attn.indexer.wk",
+            "model.layers.73.self_attn.indexer.wq_b",
+            "model.layers.73.self_attn.kv_a_proj_with_mqa",
+            "model.layers.73.self_attn.kv_b_proj",
+            "model.layers.73.self_attn.o_proj",
+            "model.layers.73.self_attn.q_a_proj",
+            "model.layers.73.self_attn.q_b_proj",
+            "model.layers.74.mlp.gate",
+            "model.layers.74.self_attn.indexer.weights_proj",
+            "model.layers.74.self_attn.indexer.wk",
+            "model.layers.74.self_attn.indexer.wq_b",
+            "model.layers.74.self_attn.kv_a_proj_with_mqa",
+            "model.layers.74.self_attn.kv_b_proj",
+            "model.layers.74.self_attn.o_proj",
+            "model.layers.74.self_attn.q_a_proj",
+            "model.layers.74.self_attn.q_b_proj",
+            "model.layers.75.mlp.gate",
+            "model.layers.75.self_attn.indexer.weights_proj",
+            "model.layers.75.self_attn.indexer.wk",
+            "model.layers.75.self_attn.indexer.wq_b",
+            "model.layers.75.self_attn.kv_a_proj_with_mqa",
+            "model.layers.75.self_attn.kv_b_proj",
+            "model.layers.75.self_attn.o_proj",
+            "model.layers.75.self_attn.q_a_proj",
+            "model.layers.75.self_attn.q_b_proj",
+            "model.layers.76.mlp.gate",
+            "model.layers.76.self_attn.indexer.weights_proj",
+            "model.layers.76.self_attn.indexer.wk",
+            "model.layers.76.self_attn.indexer.wq_b",
+            "model.layers.76.self_attn.kv_a_proj_with_mqa",
+            "model.layers.76.self_attn.kv_b_proj",
+            "model.layers.76.self_attn.o_proj",
+            "model.layers.76.self_attn.q_a_proj",
+            "model.layers.76.self_attn.q_b_proj",
+            "model.layers.77.mlp.gate",
+            "model.layers.77.self_attn.indexer.weights_proj",
+            "model.layers.77.self_attn.indexer.wk",
+            "model.layers.77.self_attn.indexer.wq_b",
+            "model.layers.77.self_attn.kv_a_proj_with_mqa",
+            "model.layers.77.self_attn.kv_b_proj",
+            "model.layers.77.self_attn.o_proj",
+            "model.layers.77.self_attn.q_a_proj",
+            "model.layers.77.self_attn.q_b_proj",
+            "model.layers.78.mlp.gate",
+            "model.layers.78.self_attn.indexer.weights_proj",
+            "model.layers.78.self_attn.indexer.wk",
+            "model.layers.78.self_attn.indexer.wq_b",
+            "model.layers.78.self_attn.kv_a_proj_with_mqa",
+            "model.layers.78.self_attn.kv_b_proj",
+            "model.layers.78.self_attn.o_proj",
+            "model.layers.78.self_attn.q_a_proj",
+            "model.layers.78.self_attn.q_b_proj",
+            "model.layers.8.mlp.gate",
+            "model.layers.8.self_attn.indexer.weights_proj",
+            "model.layers.8.self_attn.indexer.wk",
+            "model.layers.8.self_attn.indexer.wq_b",
+            "model.layers.8.self_attn.kv_a_proj_with_mqa",
+            "model.layers.8.self_attn.kv_b_proj",
+            "model.layers.8.self_attn.o_proj",
+            "model.layers.8.self_attn.q_a_proj",
+            "model.layers.8.self_attn.q_b_proj",
+            "model.layers.9.mlp.gate",
+            "model.layers.9.self_attn.indexer.weights_proj",
+            "model.layers.9.self_attn.indexer.wk",
+            "model.layers.9.self_attn.indexer.wq_b",
+            "model.layers.9.self_attn.kv_a_proj_with_mqa",
+            "model.layers.9.self_attn.kv_b_proj",
+            "model.layers.9.self_attn.o_proj",
+            "model.layers.9.self_attn.q_a_proj",
+            "model.layers.9.self_attn.q_b_proj"
+        ],
+        "algo_config": null,
+        "softmax_quant_spec": null,
+        "quant_method": "quark",
+        "layer_type_quant_config": {},
+        "layer_quant_config": {},
+        "kv_cache_quant_config": {},
+        "kv_cache_post_rope": false,
+        "quant_mode": "eager_mode",
+        "version": "0.12+4f9d2296257",
+        "export": {
+            "kv_cache_group": [],
+            "min_kv_scale": 0.0,
+            "pack_method": "reorder",
+            "weight_format": "real_quantized",
+            "weight_merge_groups": null
+        }
+    }
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "_from_model_config": true,
+  "eos_token_id": [
+    154820,
+    154827,
+    154829
+  ],
+  "pad_token_id": 154820,
+  "temperature": 1.0,
+  "top_p": 0.95,
+  "transformers_version": "5.4.0"
+}

model-00001-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e488bff5a538ff633d02a4841f29e20d9944f7aeab0100429b610baa10ef3aab
+size 5342821416

model-00002-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:95d30bae0b0da0536d770f499b51856ae0eec3580e3189c34715768577a407cd
+size 1470955064

model-00003-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:135f8460b853d66df9b327c19638b67e56528d150b0f67abc2bc3b92e1099642
+size 1423888392

model-00004-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17915688f58ff13c1c70a4d16e502e4f3f88c2629fd09ebead92b7d712e2e4c8
+size 1423888184

model-00005-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d38c0e5c23e5339949fc09136f73fbbdf053f4e7a210f9da9681046a4b516c0b
+size 1682260856

model-00006-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fb3a286df96c58f28f601a696cbbdafeebd7275eac80e6ca11bf910137c03f33
+size 1423888400

model-00007-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eebc73b1bcbc2f9df9c8830ab894b0de1e8dbc9626e14f9b02ba17bfaa9623bd
+size 1423888392

model-00008-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e92df6833556c9211f07853b884d7fccd4f52800d618f6c52be0da7d8ccd35e
+size 1423888048

model-00009-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:95ea6a8656b813e5de6163bd1ce0ceade7db3ee267a8b965245aca863f9391d1
+size 1682260992

model-00010-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f8cf038d7758a7a5fc5a32f42077fca78839810cb7774244cf4eb48b18ff58fb
+size 1423888384

model-00011-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2d4f7ea8cc7fee70b4211c86bb22ca48e29e9cb7043d933961c8cb6888acd6e5
+size 1423888344

model-00012-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8c2b6919bfb99cf56d8fa1359681526f34309a10812a297a44552ccdacfed896
+size 1427034960

model-00013-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e5ad7412350b7ef976beaa7e1ec895ac65d43ff82fe9f3e3127717783aee8cb1
+size 1679114128

model-00014-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:44e8fd8e88153526871dfb1fc78a7d1b5d50937121472074d52d9d656144995a
+size 1423888384

model-00015-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a77cee90119a737b227d2f9d19d4147ee35574ea80502d3fc30e560dd632d7bc
+size 1423888208

model-00016-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:53fa00502042a865a95fa22a5d5557cf980003ba163fd4e343f5db4246e98886
+size 1682260832

model-00017-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f7cba470e97f7879ba54d4935b7a8c68017a6ea5a3284a03beaa55739e74acab
+size 1423888408

model-00018-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9d48e5d41d235c4f030935ff5b348edf1c65bd4ffa5403f8a7284f6cc0348e0b
+size 1423888384

model-00019-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0d247271a2a2d55f8343103d6a6f928e1c029b0d73dcacffd7a31f59f7a42175
+size 1423888072

model-00020-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:193a0f85bec5cb59ad1d5f0ed1fe5a66396ea6d3247bab18f32d53899d1b845e
+size 1682260968

model-00021-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c2ef0760cb6d7407719d5ed2b706ce7a3f21521eb29631fbe790c0fb32d4b098
+size 1423888408

model-00022-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2cbd0fb496eac13c61289606811b00d02a2c2ca0f9d3f149437dfccd6bbd7611
+size 1423888344

model-00023-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1b12ac8d959d7a98b9d2d517cea43a273f2e7f8b20669cceae0289dc85ef15c1
+size 1423887976

model-00024-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9a05d1638f665847f0627a5d79391feef64696fe5b0d9180aa6dc30943d63203
+size 1682261096

model-00025-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1efc3ecd72b3fec8ff1ac591a4072fb79227d62a1ec59943feb46b021ee604c0
+size 1423888392

model-00026-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2f99f57a89d45c961826760d9310c133f10d5da57da7d0e80f9b102bc263f931
+size 1423888224

model-00027-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:895d70a61befa285aae68fd4d3010ac31e4f62fa3b3ff63e191634bc9539f3b3
+size 1682260816

model-00028-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a8fdb98cc2b92012ab76c4348e1c7fadfd7ab72a69ea0ed0ef7369a9cc278705
+size 1423888400

model-00029-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5401bcacbb11cd32a1cd295e7bb0dc0b371c61249559fc7830be80bae57835cd
+size 1423888392

model-00030-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f2070ac3455640e71c0c5d987c6699c717a67033ed1ee832dabb2c153cc82470
+size 1423888088

model-00031-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:806257f0bded5a1244786fd07f01ef40f23c4305715d8f0236448f7a4472b5ae
+size 1682260944

model-00032-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1c6f9b0277182eb39c7711c341fc454a616f9bcb8f908eef9b97ecd3f81b2bb2
+size 1423888400

model-00033-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4cd691a567379fb1df739c37849bf42d6d8564f1a33ef9d280729d21d5111aad
+size 1423888376

model-00034-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:54ade2d1fdc939dedc60c8cd6fd4bfdd4bcdc2371444e041dbf721023c14ebc0
+size 1423887976

model-00035-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:53ab48f1201f99fc66672d1571d47a8441f373688380554b9d752b5b9ba86649
+size 1682261072

model-00036-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3bc40d08be18183cb13e4b71c0ff6f8f656eb88f20fdd19e67c2a9aaa3104a80
+size 1423888392

model-00037-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:07c00c7036ac01921bdd42d0f0f62556e2dbe67b06a24d430a41977fff63b43c
+size 1423888256

model-00038-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7543077e82fe114f8a6bf5a1ad9770a7a819c9a2fe9a24fed6e3c7fba2838403
+size 2204595600

model-00039-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d027ed8068274d15f336bfc3360205ac3554cd85c028c6c95eab7b5ed1370991
+size 1489436104

model-00040-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0abfc4e39d62a2902ce95de5016a38882e43b14df832c1f529c7c021d4d9d1df
+size 1423888392

model-00041-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:13fa79952897e2a2856aa121b63e64fee90bed9c5a68f7bbbcddc9a719d8cc01
+size 1423888184

model-00042-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:98efe4f231af455f1fc04941b0ea969d41837a1b83893fb0077a4337b71c4781
+size 1682260848

model-00043-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d4f25ad5840dc169dcf2f1cab05c2d5558351e378ed20df16614ff5c75eb446a
+size 1423888400

model-00044-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:583183eb4859a97da18ccc3013f10582466236604c3ba084619dfcac24d25225
+size 1423888392

model-00045-of-00282.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2fe03d27d32c4c39192769794b3aa57ecf27496520cdfd69ed1a0d022623f829
+size 1423888048