Add xLSTM-7b.

Files changed (11) hide show

LICENSE +102 -0
README.md +29 -3
config.json +41 -0
generation_config.json +7 -0
model-00001-of-00006.safetensors +3 -0
model-00002-of-00006.safetensors +3 -0
model-00003-of-00006.safetensors +3 -0
model-00004-of-00006.safetensors +3 -0
model-00005-of-00006.safetensors +3 -0
model-00006-of-00006.safetensors +3 -0
model.safetensors.index.json +490 -0

LICENSE CHANGED Viewed

	@@ -0,0 +1,102 @@

+NXAI COMMUNITY LICENSE AGREEMENT
+Preamble 1
+We are proud to present the NXAI xLSTM 7B model and software, demonstrating the strength of next-generation RNN-based large language models, delivering high-quality performance and fast inference speeds. While xLSTM 7B is freely available for open research and development, we believe that organizations significantly benefiting from our technology should contribute back. Our goal is to support research, small and medium-sized enterprises (SMEs), and open innovation, while ensuring that large enterprises who incorporate xLSTM 7B into commercial products or services fairly compensate the creators for their research and development efforts.
+Linz, December 12, 2024.
+Preamble 2
+The NXAI COMMUNITY LICENSE AGREEMENT is based on the META LLAMA 3 COMMUNITY LICENSE AGREEMENT and contains some modifications, especially Section 2, “Additional Commercial Terms” is different.
+“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the NXAI Materials set forth herein.
+“Documentation” means the specifications, manuals and documentation accompanying NXAI Materials distributed by NXAI at https://github.com/NX-AI/.
+“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
+“NXAI Materials” means, collectively, NXAI’s proprietary large language models, algorithms and any Software, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and all other work of NXAI in the field of neural networks, Documentation (and any portion thereof) made available under this Agreement.
+“NXAI” or “we” means NXAI GmbH, Linz, Austria.
+By using or distributing any portion or element of the NXAI Materials, you agree to be bound by this Agreement.
+1. License Rights and Redistribution.
+    a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under NXAI’s intellectual property embodied in the NXAI Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the NXAI Materials.
+    b. Redistribution and Use.
+        i. If you distribute or make available the NXAI Materials (or any derivative works thereof), or a product or service that uses any of them, including another AI model, you shall (A) provide a copy of this Agreement with any such NXAI Materials; and (B) prominently display “Built with technology from NXAI” on a related website, user interface, blogpost, about page, or product documentation.
+        ii. If you receive NXAI Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
+        iii. You must retain in all copies of the NXAI Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “This product includes materials developed at NXAI that are licensed under the NXAI Community License, Copyright © NXAI GmbH, All Rights Reserved.”
+2. Additional Commercial Terms. If (a) the Licensee, on a consolidated basis (including parent, subsidiaries, and affiliates), exceeds the annual revenue of one hundred million Euros (€100,000,000) or more, and (b) the Licensee incorporates NXAI Material, in whole or in part, into a Commercial Product or Service, then the Licensee must obtain a commercial license from NXAI, which NXAI may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until NXAI otherwise expressly grants you such rights
+3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE NXAI MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND NXAI DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE NXAI MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE NXAI MATERIALS AND ANY OUTPUT AND RESULTS.
+4. Limitation of Liability. IN NO EVENT WILL NXAI OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF NXAI OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
+5. Intellectual Property.
+    a. No trademark licenses are granted under this Agreement, and in connection with the NXAI Materials, neither NXAI nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the NXAI Materials or as set forth in this Section 5(a). NXAI hereby grants you a license to use “NXAI” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. All goodwill arising out of your use of the Mark will insure to the benefit of NXAI.
+    b. Subject to NXAI’s ownership of NXAI Materials and derivatives made by or for NXAI, with respect to any derivative works and modifications of the NXAI Materials that are made by you, as between you and NXAI, you are and will be the owner of such derivative works and modifications.
+c. If you institute litigation or other proceedings against NXAI or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the NXAI Materials or models released by NXAI outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless NXAI from and against any claim by any third party arising out of or related to your use or distribution of the NXAI Materials.
+6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the NXAI Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. NXAI may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the NXAI Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
+7. Governing Law and Jurisdiction. This Agreement shall be governed by and construed in accordance with the laws of the Republic of Austria, without regard to its conflict of laws principles. The courts located in Linz, Austria shall have exclusive jurisdiction over any disputes arising out of or in connection with this Agreement.
+====================================================================================================
+This product includes software licensed under the MIT License:
+MIT License
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files
+(the "Software"), to deal in the Software without restriction,
+including without limitation the rights to use, copy, modify, merge,
+publish, distribute, sublicense, and/or sell copies of the Software,
+and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+====================================================================================================
+This product includes software licensed under the BSD-3-Clause License.
+BSD 3-Clause License
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+* Neither the name of the copyright holder nor the names of its
+  contributors may be used to endorse or promote products derived from
+  this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md CHANGED Viewed

@@ -1,5 +1,31 @@
 ---
-license: other
-license_name: nxai-community-license
-license_link: LICENSE
 ---

 ---
+license: NXAI Community License
 ---
+# xLSTM goes 7B
+This xLSTM was pre-trained on the DCLM and selected high-quality data for in a total of approx. 2.3 T tokens using the `xlstm-jax` framework.
+## How to use it
+First, install `xlstm`, which now uses the `mlstm_kernels` package for triton kernels:
+```bash
+pip install xlstm
+```
+For now, install the transformers repositiory fork from NX-AI (until it is merged):
+```bash
+pip install 'transformers @ git+ssh://git@github.com/NX-AI/transformers.git@integrate_xlstm'
+```
+Use this model as:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+xlstm = AutoModelForCausalLM.from_pretrained("NX-AI/xLSTM-7b", device_map="auto")
+# this is a fork of EleutherAI/gpt
+tokenizers = AutoTokenizer.from_pretrained("NX-AI/xLSTM-7b")
+xlstm(tokenizer("Hello xLSTM, how are you doing?"))
+```

config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "_name_or_path": "/nfs-gpu/xlstm/converted_model_checkpoints/dclm_mLSTMv1_7B_ctx8192_sep_finetune_2024-11-29T17:03:51_0_550000",
+  "add_embedding_dropout": false,
+  "add_forward_backend_padding": false,
+  "add_out_norm": true,
+  "add_post_blocks_norm": true,
+  "add_post_norm": false,
+  "add_qk_norm": false,
+  "architectures": [
+    "xLSTMForCausalLM"
+  ],
+  "bos_token_id": 0,
+  "cell_norm_eps": 1e-06,
+  "embedding_dim": 4096,
+  "eos_token_id": 2,
+  "ffn_proj_factor": 2.667,
+  "ffn_round_up_to_multiple_of": 64,
+  "force_bos_token_insert": true,
+  "forward_backend_name": "chunkwise--triton_limit_chunk",
+  "gate_soft_cap": 15.0,
+  "head_dim": 512,
+  "igate_bias_init_range": -10.0,
+  "mlstm_round_up_to_multiple_of": 64,
+  "model_type": "xlstm",
+  "norm_eps": 1e-06,
+  "norm_reduction_force_float32": true,
+  "num_blocks": 32,
+  "num_heads": 8,
+  "output_logit_soft_cap": 30.0,
+  "pad_token_id": 1,
+  "qk_dim_factor": 0.5,
+  "return_last_states": true,
+  "step_backend_name": "triton_fused",
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.47.0.dev0",
+  "use_bias": false,
+  "use_cache": true,
+  "v_dim_factor": 1.0,
+  "vocab_size": 50304
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": 2,
+  "pad_token_id": 1,
+  "transformers_version": "4.47.0.dev0"
+}

model-00001-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2cc9c6ee0a75ec687cf18bad71908421e69b76ea383dcf72fad8c00177bca1f5
+size 4991755784

model-00002-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8210891859f0f60c60df05effc7b43b4d11490c5c6ef889450ba827edb5dffae
+size 4974522128

model-00003-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2902deaf4d990869e37f3ab4dc002f5f8e8ab85142aca7af82143569ec74f272
+size 4840008560

model-00004-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:29cfabad6d13965d248bce3786c7b3a9eed21b0b8e7ce0e7fb561720cf03b6fc
+size 4840008560

model-00005-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f6a0f4cf4a1c7a11cb073854048813b6f3277f1b2ef6be26f3bb9a35878a40c
+size 4840008560

model-00006-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ed2b95082a8a3e0a9444d7bc219ba1eb49009863768ec029768de3383d56f44a
+size 2975453352

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,490 @@

+{
+  "metadata": {
+    "total_size": 27461699584
+  },
+  "weight_map": {
+    "backbone.blocks.0.ffn.proj_down.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.ffn.proj_up.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.ffn.proj_up_gate.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.mlstm_layer.fgate_preact.bias": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.mlstm_layer.fgate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.mlstm_layer.igate_preact.bias": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.mlstm_layer.igate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.mlstm_layer.k.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.mlstm_layer.multihead_norm.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.mlstm_layer.ogate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.mlstm_layer.out_proj.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.mlstm_layer.q.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.mlstm_layer.v.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.norm_ffn.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.0.norm_mlstm.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.ffn.proj_down.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.ffn.proj_up.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.ffn.proj_up_gate.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.mlstm_layer.fgate_preact.bias": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.mlstm_layer.fgate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.mlstm_layer.igate_preact.bias": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.mlstm_layer.igate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.mlstm_layer.k.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.mlstm_layer.multihead_norm.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.mlstm_layer.ogate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.mlstm_layer.out_proj.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.mlstm_layer.q.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.mlstm_layer.v.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.norm_ffn.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.1.norm_mlstm.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.10.ffn.proj_down.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.ffn.proj_up.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.ffn.proj_up_gate.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.mlstm_layer.fgate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.mlstm_layer.fgate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.mlstm_layer.igate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.mlstm_layer.igate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.mlstm_layer.k.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.mlstm_layer.multihead_norm.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.mlstm_layer.ogate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.mlstm_layer.out_proj.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.mlstm_layer.q.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.mlstm_layer.v.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.norm_ffn.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.10.norm_mlstm.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.11.ffn.proj_down.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.11.ffn.proj_up.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.11.ffn.proj_up_gate.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.11.mlstm_layer.fgate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.11.mlstm_layer.fgate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.11.mlstm_layer.igate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.11.mlstm_layer.igate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.11.mlstm_layer.k.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.11.mlstm_layer.multihead_norm.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.11.mlstm_layer.ogate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.11.mlstm_layer.out_proj.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.11.mlstm_layer.q.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.11.mlstm_layer.v.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.11.norm_ffn.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.11.norm_mlstm.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.12.ffn.proj_down.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.ffn.proj_up.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.ffn.proj_up_gate.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.mlstm_layer.fgate_preact.bias": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.mlstm_layer.fgate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.mlstm_layer.igate_preact.bias": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.mlstm_layer.igate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.mlstm_layer.k.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.mlstm_layer.multihead_norm.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.mlstm_layer.ogate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.mlstm_layer.out_proj.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.mlstm_layer.q.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.mlstm_layer.v.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.norm_ffn.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.12.norm_mlstm.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.ffn.proj_down.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.ffn.proj_up.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.ffn.proj_up_gate.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.mlstm_layer.fgate_preact.bias": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.mlstm_layer.fgate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.mlstm_layer.igate_preact.bias": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.mlstm_layer.igate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.mlstm_layer.k.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.mlstm_layer.multihead_norm.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.mlstm_layer.ogate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.mlstm_layer.out_proj.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.mlstm_layer.q.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.mlstm_layer.v.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.norm_ffn.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.13.norm_mlstm.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.ffn.proj_down.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.ffn.proj_up.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.ffn.proj_up_gate.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.mlstm_layer.fgate_preact.bias": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.mlstm_layer.fgate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.mlstm_layer.igate_preact.bias": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.mlstm_layer.igate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.mlstm_layer.k.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.mlstm_layer.multihead_norm.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.mlstm_layer.ogate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.mlstm_layer.out_proj.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.mlstm_layer.q.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.mlstm_layer.v.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.norm_ffn.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.14.norm_mlstm.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.ffn.proj_down.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.ffn.proj_up.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.ffn.proj_up_gate.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.mlstm_layer.fgate_preact.bias": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.mlstm_layer.fgate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.mlstm_layer.igate_preact.bias": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.mlstm_layer.igate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.mlstm_layer.k.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.mlstm_layer.multihead_norm.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.mlstm_layer.ogate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.mlstm_layer.out_proj.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.mlstm_layer.q.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.mlstm_layer.v.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.norm_ffn.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.15.norm_mlstm.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.ffn.proj_down.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.ffn.proj_up.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.ffn.proj_up_gate.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.mlstm_layer.fgate_preact.bias": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.mlstm_layer.fgate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.mlstm_layer.igate_preact.bias": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.mlstm_layer.igate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.mlstm_layer.k.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.mlstm_layer.multihead_norm.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.mlstm_layer.ogate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.mlstm_layer.out_proj.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.mlstm_layer.q.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.mlstm_layer.v.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.norm_ffn.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.16.norm_mlstm.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.17.ffn.proj_down.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.17.ffn.proj_up.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.17.ffn.proj_up_gate.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.17.mlstm_layer.fgate_preact.bias": "model-00003-of-00006.safetensors",
+    "backbone.blocks.17.mlstm_layer.fgate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.17.mlstm_layer.igate_preact.bias": "model-00003-of-00006.safetensors",
+    "backbone.blocks.17.mlstm_layer.igate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.17.mlstm_layer.k.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.17.mlstm_layer.multihead_norm.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.17.mlstm_layer.ogate_preact.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.17.mlstm_layer.out_proj.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.17.mlstm_layer.q.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.17.mlstm_layer.v.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.17.norm_ffn.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.17.norm_mlstm.weight": "model-00003-of-00006.safetensors",
+    "backbone.blocks.18.ffn.proj_down.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.ffn.proj_up.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.ffn.proj_up_gate.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.mlstm_layer.fgate_preact.bias": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.mlstm_layer.fgate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.mlstm_layer.igate_preact.bias": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.mlstm_layer.igate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.mlstm_layer.k.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.mlstm_layer.multihead_norm.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.mlstm_layer.ogate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.mlstm_layer.out_proj.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.mlstm_layer.q.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.mlstm_layer.v.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.norm_ffn.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.18.norm_mlstm.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.ffn.proj_down.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.ffn.proj_up.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.ffn.proj_up_gate.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.mlstm_layer.fgate_preact.bias": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.mlstm_layer.fgate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.mlstm_layer.igate_preact.bias": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.mlstm_layer.igate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.mlstm_layer.k.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.mlstm_layer.multihead_norm.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.mlstm_layer.ogate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.mlstm_layer.out_proj.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.mlstm_layer.q.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.mlstm_layer.v.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.norm_ffn.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.19.norm_mlstm.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.2.ffn.proj_down.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.ffn.proj_up.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.ffn.proj_up_gate.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.mlstm_layer.fgate_preact.bias": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.mlstm_layer.fgate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.mlstm_layer.igate_preact.bias": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.mlstm_layer.igate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.mlstm_layer.k.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.mlstm_layer.multihead_norm.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.mlstm_layer.ogate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.mlstm_layer.out_proj.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.mlstm_layer.q.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.mlstm_layer.v.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.norm_ffn.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.2.norm_mlstm.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.20.ffn.proj_down.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.ffn.proj_up.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.ffn.proj_up_gate.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.mlstm_layer.fgate_preact.bias": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.mlstm_layer.fgate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.mlstm_layer.igate_preact.bias": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.mlstm_layer.igate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.mlstm_layer.k.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.mlstm_layer.multihead_norm.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.mlstm_layer.ogate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.mlstm_layer.out_proj.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.mlstm_layer.q.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.mlstm_layer.v.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.norm_ffn.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.20.norm_mlstm.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.ffn.proj_down.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.ffn.proj_up.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.ffn.proj_up_gate.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.mlstm_layer.fgate_preact.bias": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.mlstm_layer.fgate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.mlstm_layer.igate_preact.bias": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.mlstm_layer.igate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.mlstm_layer.k.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.mlstm_layer.multihead_norm.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.mlstm_layer.ogate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.mlstm_layer.out_proj.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.mlstm_layer.q.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.mlstm_layer.v.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.norm_ffn.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.21.norm_mlstm.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.ffn.proj_down.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.ffn.proj_up.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.ffn.proj_up_gate.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.mlstm_layer.fgate_preact.bias": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.mlstm_layer.fgate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.mlstm_layer.igate_preact.bias": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.mlstm_layer.igate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.mlstm_layer.k.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.mlstm_layer.multihead_norm.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.mlstm_layer.ogate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.mlstm_layer.out_proj.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.mlstm_layer.q.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.mlstm_layer.v.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.norm_ffn.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.22.norm_mlstm.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.23.ffn.proj_down.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.23.ffn.proj_up.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.23.ffn.proj_up_gate.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.23.mlstm_layer.fgate_preact.bias": "model-00004-of-00006.safetensors",
+    "backbone.blocks.23.mlstm_layer.fgate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.23.mlstm_layer.igate_preact.bias": "model-00004-of-00006.safetensors",
+    "backbone.blocks.23.mlstm_layer.igate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.23.mlstm_layer.k.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.23.mlstm_layer.multihead_norm.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.23.mlstm_layer.ogate_preact.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.23.mlstm_layer.out_proj.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.23.mlstm_layer.q.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.23.mlstm_layer.v.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.23.norm_ffn.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.23.norm_mlstm.weight": "model-00004-of-00006.safetensors",
+    "backbone.blocks.24.ffn.proj_down.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.ffn.proj_up.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.ffn.proj_up_gate.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.mlstm_layer.fgate_preact.bias": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.mlstm_layer.fgate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.mlstm_layer.igate_preact.bias": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.mlstm_layer.igate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.mlstm_layer.k.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.mlstm_layer.multihead_norm.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.mlstm_layer.ogate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.mlstm_layer.out_proj.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.mlstm_layer.q.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.mlstm_layer.v.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.norm_ffn.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.24.norm_mlstm.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.ffn.proj_down.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.ffn.proj_up.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.ffn.proj_up_gate.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.mlstm_layer.fgate_preact.bias": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.mlstm_layer.fgate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.mlstm_layer.igate_preact.bias": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.mlstm_layer.igate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.mlstm_layer.k.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.mlstm_layer.multihead_norm.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.mlstm_layer.ogate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.mlstm_layer.out_proj.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.mlstm_layer.q.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.mlstm_layer.v.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.norm_ffn.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.25.norm_mlstm.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.ffn.proj_down.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.ffn.proj_up.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.ffn.proj_up_gate.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.mlstm_layer.fgate_preact.bias": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.mlstm_layer.fgate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.mlstm_layer.igate_preact.bias": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.mlstm_layer.igate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.mlstm_layer.k.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.mlstm_layer.multihead_norm.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.mlstm_layer.ogate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.mlstm_layer.out_proj.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.mlstm_layer.q.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.mlstm_layer.v.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.norm_ffn.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.26.norm_mlstm.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.ffn.proj_down.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.ffn.proj_up.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.ffn.proj_up_gate.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.mlstm_layer.fgate_preact.bias": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.mlstm_layer.fgate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.mlstm_layer.igate_preact.bias": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.mlstm_layer.igate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.mlstm_layer.k.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.mlstm_layer.multihead_norm.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.mlstm_layer.ogate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.mlstm_layer.out_proj.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.mlstm_layer.q.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.mlstm_layer.v.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.norm_ffn.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.27.norm_mlstm.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.ffn.proj_down.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.ffn.proj_up.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.ffn.proj_up_gate.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.mlstm_layer.fgate_preact.bias": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.mlstm_layer.fgate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.mlstm_layer.igate_preact.bias": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.mlstm_layer.igate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.mlstm_layer.k.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.mlstm_layer.multihead_norm.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.mlstm_layer.ogate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.mlstm_layer.out_proj.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.mlstm_layer.q.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.mlstm_layer.v.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.norm_ffn.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.28.norm_mlstm.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.29.ffn.proj_down.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.29.ffn.proj_up.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.29.ffn.proj_up_gate.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.29.mlstm_layer.fgate_preact.bias": "model-00005-of-00006.safetensors",
+    "backbone.blocks.29.mlstm_layer.fgate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.29.mlstm_layer.igate_preact.bias": "model-00005-of-00006.safetensors",
+    "backbone.blocks.29.mlstm_layer.igate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.29.mlstm_layer.k.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.29.mlstm_layer.multihead_norm.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.29.mlstm_layer.ogate_preact.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.29.mlstm_layer.out_proj.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.29.mlstm_layer.q.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.29.mlstm_layer.v.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.29.norm_ffn.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.29.norm_mlstm.weight": "model-00005-of-00006.safetensors",
+    "backbone.blocks.3.ffn.proj_down.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.ffn.proj_up.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.ffn.proj_up_gate.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.mlstm_layer.fgate_preact.bias": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.mlstm_layer.fgate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.mlstm_layer.igate_preact.bias": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.mlstm_layer.igate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.mlstm_layer.k.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.mlstm_layer.multihead_norm.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.mlstm_layer.ogate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.mlstm_layer.out_proj.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.mlstm_layer.q.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.mlstm_layer.v.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.norm_ffn.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.3.norm_mlstm.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.30.ffn.proj_down.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.ffn.proj_up.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.ffn.proj_up_gate.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.mlstm_layer.fgate_preact.bias": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.mlstm_layer.fgate_preact.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.mlstm_layer.igate_preact.bias": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.mlstm_layer.igate_preact.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.mlstm_layer.k.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.mlstm_layer.multihead_norm.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.mlstm_layer.ogate_preact.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.mlstm_layer.out_proj.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.mlstm_layer.q.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.mlstm_layer.v.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.norm_ffn.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.30.norm_mlstm.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.ffn.proj_down.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.ffn.proj_up.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.ffn.proj_up_gate.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.mlstm_layer.fgate_preact.bias": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.mlstm_layer.fgate_preact.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.mlstm_layer.igate_preact.bias": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.mlstm_layer.igate_preact.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.mlstm_layer.k.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.mlstm_layer.multihead_norm.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.mlstm_layer.ogate_preact.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.mlstm_layer.out_proj.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.mlstm_layer.q.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.mlstm_layer.v.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.norm_ffn.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.31.norm_mlstm.weight": "model-00006-of-00006.safetensors",
+    "backbone.blocks.4.ffn.proj_down.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.ffn.proj_up.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.ffn.proj_up_gate.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.mlstm_layer.fgate_preact.bias": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.mlstm_layer.fgate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.mlstm_layer.igate_preact.bias": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.mlstm_layer.igate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.mlstm_layer.k.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.mlstm_layer.multihead_norm.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.mlstm_layer.ogate_preact.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.mlstm_layer.out_proj.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.mlstm_layer.q.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.mlstm_layer.v.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.norm_ffn.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.4.norm_mlstm.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.5.ffn.proj_down.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.5.ffn.proj_up.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.5.ffn.proj_up_gate.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.5.mlstm_layer.fgate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.5.mlstm_layer.fgate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.5.mlstm_layer.igate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.5.mlstm_layer.igate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.5.mlstm_layer.k.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.5.mlstm_layer.multihead_norm.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.5.mlstm_layer.ogate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.5.mlstm_layer.out_proj.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.5.mlstm_layer.q.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.5.mlstm_layer.v.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.5.norm_ffn.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.5.norm_mlstm.weight": "model-00001-of-00006.safetensors",
+    "backbone.blocks.6.ffn.proj_down.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.ffn.proj_up.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.ffn.proj_up_gate.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.mlstm_layer.fgate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.mlstm_layer.fgate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.mlstm_layer.igate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.mlstm_layer.igate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.mlstm_layer.k.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.mlstm_layer.multihead_norm.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.mlstm_layer.ogate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.mlstm_layer.out_proj.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.mlstm_layer.q.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.mlstm_layer.v.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.norm_ffn.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.6.norm_mlstm.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.ffn.proj_down.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.ffn.proj_up.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.ffn.proj_up_gate.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.mlstm_layer.fgate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.mlstm_layer.fgate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.mlstm_layer.igate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.mlstm_layer.igate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.mlstm_layer.k.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.mlstm_layer.multihead_norm.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.mlstm_layer.ogate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.mlstm_layer.out_proj.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.mlstm_layer.q.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.mlstm_layer.v.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.norm_ffn.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.7.norm_mlstm.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.ffn.proj_down.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.ffn.proj_up.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.ffn.proj_up_gate.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.mlstm_layer.fgate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.mlstm_layer.fgate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.mlstm_layer.igate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.mlstm_layer.igate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.mlstm_layer.k.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.mlstm_layer.multihead_norm.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.mlstm_layer.ogate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.mlstm_layer.out_proj.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.mlstm_layer.q.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.mlstm_layer.v.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.norm_ffn.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.8.norm_mlstm.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.ffn.proj_down.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.ffn.proj_up.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.ffn.proj_up_gate.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.mlstm_layer.fgate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.mlstm_layer.fgate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.mlstm_layer.igate_preact.bias": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.mlstm_layer.igate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.mlstm_layer.k.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.mlstm_layer.multihead_norm.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.mlstm_layer.ogate_preact.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.mlstm_layer.out_proj.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.mlstm_layer.q.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.mlstm_layer.v.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.norm_ffn.weight": "model-00002-of-00006.safetensors",
+    "backbone.blocks.9.norm_mlstm.weight": "model-00002-of-00006.safetensors",
+    "backbone.embeddings.weight": "model-00001-of-00006.safetensors",
+    "backbone.out_norm.weight": "model-00006-of-00006.safetensors",
+    "lm_head.weight": "model-00006-of-00006.safetensors"
+  }
+}