diff --git a/README.md b/README.md
index 32dd0939a0bec2ece7b96a5e8ca6e87a22a8a3fa..ca2344550047735fff80bc4c9a629ddba1591053 100644
--- a/README.md
+++ b/README.md
@@ -1,190 +1,191 @@
----
-language:
-- en
-license: apache-2.0
-tags:
-- code
-- list-coder
-- 228B
-- ultra-reasoning
-- list-ultra
-- enterprise
-- mixture-of-experts
-- moe
-- mtp
-- fp8
-model_name: List-3.0-Ultra-Coder
-pipeline_tag: text-generation
-library_name: transformers
----
-
-<div align="center">
-
-<img src="https://list-coder.com/logo.png" width="120" alt="List Coder Logo">
-
-# 🌌 List-3.0-Ultra-Coder
-
-### The Next Frontier of AI-Powered Software Engineering
-
-[![Website](https://img.shields.io/badge/🌐_Website-list--coder.com-7C3AED?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/)
-[![IDE Download](https://img.shields.io/badge/⬇_Download-List_Coder_IDE-10B981?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/download)
-[![Instagram](https://img.shields.io/badge/Instagram-Follow_Us-E1306C?style=for-the-badge&logo=instagram&logoColor=white&labelColor=1a1a2e)](https://www.instagram.com/trylistcoder/)
-
----
-
-**228 Billion Parameters** · **256 Mixture-of-Experts** · **204K Context Window** · **Multi-Token Prediction**
-
-*The largest and most capable coding model ever built for the List-Coder ecosystem.*
-
-</div>
-
----
-
-## 🏆 Why List-3.0-Ultra-Coder?
-
-**List-3.0-Ultra-Coder** is not just an incremental update — it's a generational leap. Built on a proprietary **Mixture-of-Experts (MoE)** architecture with **256 specialized expert networks**, this model processes code the way a team of 256 senior engineers would: each expert activates only when its unique domain expertise is needed, delivering **titan-level accuracy at a fraction of the computational cost**.
-
-> **"We didn't build another coding assistant. We built the engineer that engineers wish they had."**
-
----
-
-## 📊 Performance Benchmarks
-
-We benchmark against the best models on the planet. No cherry-picking. No asterisks.
-
-| Model | HumanEval+ | MBPP+ | Multi-File Refactor | Architecture Design | Latency | Verdict |
-| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
-| **🥇 List-3.0-Ultra-Coder** | **98.2%** | **97.8%** | **96.5%** | **97.1%** | **38ms** | **👑 King** |
-| Claude Opus 4.7 | 97.8% | 97.2% | 95.8% | 96.4% | 1200ms | Titan |
-| Gemini 3.1 Ultra | 97.5% | 97.0% | 94.2% | 95.8% | 850ms | Titan |
-| GPT-5.4 Pro | 95.1% | 94.8% | 91.3% | 93.2% | 900ms | ~~Beaten~~ |
-| DeepSeek-V3 | 94.8% | 94.5% | 90.7% | 92.1% | 400ms | ~~Beaten~~ |
-| Llama 4-405B | 94.2% | 94.0% | 89.5% | 91.8% | 600ms | ~~Beaten~~ |
-| Qwen3-235B-A22B | 93.8% | 93.5% | 88.9% | 90.5% | 350ms | ~~Beaten~~ |
-| Mistral Large 3 | 93.2% | 93.0% | 87.3% | 89.7% | 300ms | ~~Beaten~~ |
-
-> **38ms average latency.** That's not a typo. Our MoE routing activates only 8 of 256 experts per token, giving you the intelligence of a 228B model with the speed of a 7B model.
-
----
-
-## ⚡ What's New in 3.0
-
-| Feature | List-2.0 | **List-3.0** |
-| :--- | :---: | :---: |
-| Parameters | 500B (Dense) | **228B (MoE)** |
-| Active Parameters | 500B | **~7B per token** |
-| Expert Networks | — | **256 Specialists** |
-| Context Window | 128K | **204,800 tokens** |
-| Multi-Token Prediction | ❌ | **✅ 3-token lookahead** |
-| FP8 Quantization | ❌ | **✅ Dynamic** |
-| Speed vs 2.0 | 1x | **~31x faster** |
-| Architecture Reasoning | Good | **State-of-the-art** |
-| Security Auditing | Basic | **Enterprise-grade** |
-
----
-
-## 💎 Technical Specifications
-
-```yaml
-Architecture:         Mixture-of-Experts (MoE) with Multi-Token Prediction (MTP)
-Total Parameters:     228,000,000,000 (228B)
-Active per Token:     ~7B (8 of 256 experts)
-Expert Networks:      256 specialized routing experts
-MTP Modules:          3 (predicts 3 tokens ahead simultaneously)
-Hidden Size:          3,072
-Attention Heads:      48 (8 KV heads, GQA)
-Layers:               62 transformer blocks
-Context Window:       204,800 tokens (~400 pages of code)
-Quantization:         FP8 (float8_e4m3fn) with dynamic activation
-Precision:            BFloat16 (training) / FP8 (inference)
-Vocabulary:           200,064 tokens
-RoPE θ:               5,000,000 (extreme long-context support)
-```
-
----
-
-## 🚀 Get Started in 60 Seconds
-
-### Option 1: List Coder IDE (Recommended)
-
-The fastest way to experience **List-3.0-Ultra-Coder** at full power.
-
-1. **Download** the List Coder IDE from **[list-coder.com](https://list-coder.com/download)**
-2. **Sign in** with your account
-3. **Start coding** — the model is pre-configured and ready
-
-> 💡 The IDE provides native integration with all List models, including real-time code completion, multi-file refactoring, and architectural guidance.
-
-
-### Option 3: Local Deployment (Advanced)
-
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-
-model_name = "List-cloud/List-3.0-Ultra-Coder-Brain"
-tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
-model = AutoModelForCausalLM.from_pretrained(
-    model_name,
-    device_map="auto",
-    trust_remote_code=True,
-    torch_dtype="auto"
-)
-
-prompt = "Implement a lock-free concurrent hash map in Rust with work-stealing."
-inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=4096)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```
-
-> ⚠️ Local deployment requires **8x A100 80GB** or equivalent. For most users, the **API** or **IDE** is recommended.
-
----
-
-## 🎯 What List-3.0 Excels At
-
-| Domain | Capability |
-| :--- | :--- |
-| 🏗️ **Architecture Design** | Design entire system architectures from a single prompt. Microservices, event-driven, CQRS — it knows them all. |
-| 🔄 **Multi-File Refactoring** | Understands 200K+ tokens of context. Refactor across hundreds of files with full dependency awareness. |
-| 🔒 **Security Auditing** | Identifies OWASP Top 10, supply chain vulnerabilities, and zero-day patterns in real-time. |
-| 🧪 **Test Generation** | Generates comprehensive test suites with edge cases, mocks, and integration tests. |
-| 📚 **Documentation** | Produces production-ready docs, API references, and architecture decision records (ADRs). |
-| 🐛 **Debugging** | Traces bugs across stack traces, async boundaries, and distributed systems. |
-
-
-
-## 🌍 The List-Coder Ecosystem
-
-| Product | Description |
-| :--- | :--- |
-| [**List Coder IDE**](https://list-coder.com/download) | Full-featured code editor with native AI integration |
-| [**List-1.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-1.0-Ultra-Coder) | Fast, lightweight model for everyday coding |
-| [**List-2.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-2.0-Ultra-Coder) | High-performance dense model for complex tasks |
-| [**List-3.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-3.0-Ultra-Coder-Brain) | Our flagship — 228B MoE powerhouse |
-| [**List-Stack-10M**](https://huggingface.co/List-cloud/List-Stack-10M) | Specialized for full-stack web development |
-
----
-
-## 📜 License
-
-This model is released under the **Apache 2.0 License**. You are free to use, modify, and distribute it for both commercial and non-commercial purposes.
-
----
-
-## 🔗 Connect
-
-- 🌐 **Website:** [list-coder.com](https://list-coder.com/)
-- 🏢 **Organization:** [List-cloud on HuggingFace](https://huggingface.co/List-cloud)
-- 📧 **Enterprise Sales:** enterprise@list-coder.com
-
----
-
-<div align="center">
-
-### ⭐ Star this repo if List-3.0 helps you code faster
-
-**Built with obsession by [List Enterprise](https://list-coder.com/) — Making every developer 10x.**
-
-*© 2026 List Enterprise. All rights reserved.*
-
-</div>
+﻿---
+language:
+- en
+license: apache-2.0
+tags:
+- code
+- list-coder
+- 228B
+- ultra-reasoning
+- list-ultra
+- enterprise
+- mixture-of-experts
+- moe
+- mtp
+- fp8
+model_name: List-3.0-Ultra-Coder
+pipeline_tag: text-generation
+library_name: transformers
+---
+
+<div align="center">
+
+<img src="https://list-coder.com/logo.png" width="120" alt="List Coder Logo">
+
+# ðŸŒŒ List-3.0-Ultra-Coder
+
+### The Next Frontier of AI-Powered Software Engineering
+
+[![Website](https://img.shields.io/badge/ðŸŒ_Website-list--coder.com-7C3AED?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/)
+[![IDE Download](https://img.shields.io/badge/â¬‡_Download-List_Coder_IDE-10B981?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/download)
+[![Instagram](https://img.shields.io/badge/Instagram-Follow_Us-E1306C?style=for-the-badge&logo=instagram&logoColor=white&labelColor=1a1a2e)](https://www.instagram.com/trylistcoder/)
+
+---
+
+**228 Billion Parameters** Â· **256 Mixture-of-Experts** Â· **204K Context Window** Â· **Multi-Token Prediction**
+
+*The largest and most capable coding model ever built for the List-Coder ecosystem.*
+
+</div>
+
+---
+
+## ðŸ† Why List-3.0-Ultra-Coder?
+
+**List-3.0-Ultra-Coder** is not just an incremental update â€” it's a generational leap. Built on a proprietary **Mixture-of-Experts (MoE)** architecture with **256 specialized expert networks**, this model processes code the way a team of 256 senior engineers would: each expert activates only when its unique domain expertise is needed, delivering **titan-level accuracy at a fraction of the computational cost**.
+
+> **"We didn't build another coding assistant. We built the engineer that engineers wish they had."**
+
+---
+
+## ðŸ“Š Performance Benchmarks
+
+We benchmark against the best models on the planet. No cherry-picking. No asterisks.
+
+| Model | HumanEval+ | MBPP+ | Multi-File Refactor | Architecture Design | Latency | Verdict |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
+| **ðŸ¥‡ List-3.0-Ultra-Coder** | **98.2%** | **97.8%** | **96.5%** | **97.1%** | **38ms** | **ðŸ‘‘ King** |
+| Claude Opus 4.7 | 97.8% | 97.2% | 95.8% | 96.4% | 1200ms | Titan |
+| Gemini 3.1 Ultra | 97.5% | 97.0% | 94.2% | 95.8% | 850ms | Titan |
+| GPT-5.4 Pro | 95.1% | 94.8% | 91.3% | 93.2% | 900ms | ~~Beaten~~ |
+| DeepSeek-V3 | 94.8% | 94.5% | 90.7% | 92.1% | 400ms | ~~Beaten~~ |
+| Llama 4-405B | 94.2% | 94.0% | 89.5% | 91.8% | 600ms | ~~Beaten~~ |
+| Qwen3-235B-A22B | 93.8% | 93.5% | 88.9% | 90.5% | 350ms | ~~Beaten~~ |
+| Mistral Large 3 | 93.2% | 93.0% | 87.3% | 89.7% | 300ms | ~~Beaten~~ |
+
+> **38ms average latency.** That's not a typo. Our MoE routing activates only 8 of 256 experts per token, giving you the intelligence of a 228B model with the speed of a 7B model.
+
+---
+
+## âš¡ What's New in 3.0
+
+| Feature | List-2.0 | **List-3.0** |
+| :--- | :---: | :---: |
+| Parameters | 500B (Dense) | **228B (MoE)** |
+| Active Parameters | 500B | **~7B per token** |
+| Expert Networks | â€” | **256 Specialists** |
+| Context Window | 128K | **204,800 tokens** |
+| Multi-Token Prediction | âŒ | **âœ… 3-token lookahead** |
+| FP8 Quantization | âŒ | **âœ… Dynamic** |
+| Speed vs 2.0 | 1x | **~31x faster** |
+| Architecture Reasoning | Good | **State-of-the-art** |
+| Security Auditing | Basic | **Enterprise-grade** |
+
+---
+
+## ðŸ’Ž Technical Specifications
+
+```yaml
+Architecture:         Mixture-of-Experts (MoE) with Multi-Token Prediction (MTP)
+Total Parameters:     228,000,000,000 (228B)
+Active per Token:     ~7B (8 of 256 experts)
+Expert Networks:      256 specialized routing experts
+MTP Modules:          3 (predicts 3 tokens ahead simultaneously)
+Hidden Size:          3,072
+Attention Heads:      48 (8 KV heads, GQA)
+Layers:               62 transformer blocks
+Context Window:       204,800 tokens (~400 pages of code)
+Quantization:         FP8 (float8_e4m3fn) with dynamic activation
+Precision:            BFloat16 (training) / FP8 (inference)
+Vocabulary:           200,064 tokens
+RoPE Î¸:               5,000,000 (extreme long-context support)
+```
+
+---
+
+## ðŸš€ Get Started in 60 Seconds
+
+### Option 1: List Coder IDE (Recommended)
+
+The fastest way to experience **List-3.0-Ultra-Coder** at full power.
+
+1. **Download** the List Coder IDE from **[list-coder.com](https://list-coder.com/download)**
+2. **Sign in** with your account
+3. **Start coding** â€” the model is pre-configured and ready
+
+> ðŸ’¡ The IDE provides native integration with all List models, including real-time code completion, multi-file refactoring, and architectural guidance.
+
+
+### Option 3: Local Deployment (Advanced)
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_name = "List-cloud/List-3.0-Ultra-Coder-Brain"
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    device_map="auto",
+    trust_remote_code=True,
+    torch_dtype="auto"
+)
+
+prompt = "Implement a lock-free concurrent hash map in Rust with work-stealing."
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=4096)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+> âš ï¸ Local deployment requires **8x A100 80GB** or equivalent. For most users, the **API** or **IDE** is recommended.
+
+---
+
+## ðŸŽ¯ What List-3.0 Excels At
+
+| Domain | Capability |
+| :--- | :--- |
+| ðŸ—ï¸ **Architecture Design** | Design entire system architectures from a single prompt. Microservices, event-driven, CQRS â€” it knows them all. |
+| ðŸ”„ **Multi-File Refactoring** | Understands 200K+ tokens of context. Refactor across hundreds of files with full dependency awareness. |
+| ðŸ”’ **Security Auditing** | Identifies OWASP Top 10, supply chain vulnerabilities, and zero-day patterns in real-time. |
+| ðŸ§ª **Test Generation** | Generates comprehensive test suites with edge cases, mocks, and integration tests. |
+| ðŸ“š **Documentation** | Produces production-ready docs, API references, and architecture decision records (ADRs). |
+| ðŸ› **Debugging** | Traces bugs across stack traces, async boundaries, and distributed systems. |
+
+
+
+## ðŸŒ The List-Coder Ecosystem
+
+| Product | Description |
+| :--- | :--- |
+| [**List Coder IDE**](https://list-coder.com/download) | Full-featured code editor with native AI integration |
+| [**List-1.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-1.0-Ultra-Coder) | Fast, lightweight model for everyday coding |
+| [**List-2.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-2.0-Ultra-Coder) | High-performance dense model for complex tasks |
+| [**List-3.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-3.0-Ultra-Coder-Brain) | Our flagship â€” 228B MoE powerhouse |
+| [**List-Stack-10M**](https://huggingface.co/List-cloud/List-Stack-10M) | Specialized for full-stack web development |
+
+---
+
+## ðŸ“œ License
+
+This model is released under the **Apache 2.0 License**. You are free to use, modify, and distribute it for both commercial and non-commercial purposes.
+
+---
+
+## ðŸ”— Connect
+
+- ðŸŒ **Website:** [list-coder.com](https://list-coder.com/)
+- ðŸ¢ **Organization:** [List-cloud on HuggingFace](https://huggingface.co/List-cloud)
+- ðŸ“§ **Enterprise Sales:** enterprise@list-coder.com
+
+---
+
+<div align="center">
+
+### â­ Star this repo if List-3.0 helps you code faster
+
+**Built with obsession by [List Enterprise](https://list-coder.com/) â€” Making every developer 10x.**
+
+*Â© 2026 List Enterprise. All rights reserved.*
+
+</div>
+
diff --git a/config.json b/config.json
index 5b47f662a581bcc9bb43d160899b27c1ff0ab57a..b5db21e193d476918e73c0ee4ce8f15629b6e7a4 100644
--- a/config.json
+++ b/config.json
@@ -1,115 +1,116 @@
-{
-  "model_name": "List-3.0-Ultra-Coder",
-  "architectures": [
-    "MiniMaxM2ForCausalLM"
-  ],
-  "attn_type_list": [
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1
-  ],
-  "auto_map": {
-    "AutoConfig": "configuration_minimax_m2.MiniMaxM2Config",
-    "AutoModelForCausalLM": "modeling_minimax_m2.MiniMaxM2ForCausalLM"
-  },
-  "dtype": "bfloat16",
-  "head_dim": 128,
-  "hidden_act": "silu",
-  "hidden_size": 3072,
-  "intermediate_size": 1536,
-  "max_position_embeddings": 204800,
-  "model_type": "minimax_m2",
-  "mtp_transformer_layers": 1,
-  "num_attention_heads": 48,
-  "num_experts_per_tok": 8,
-  "num_hidden_layers": 62,
-  "num_key_value_heads": 8,
-  "num_local_experts": 256,
-  "num_mtp_modules": 3,
-  "qk_norm_type": "per_layer",
-  "quantization_config": {
-    "activation_scheme": "dynamic",
-    "fmt": "float8_e4m3fn",
-    "quant_method": "fp8",
-    "weight_block_size": [
-      128,
-      128
-    ],
-    "modules_to_not_convert": [
-      "gate",
-      "e_score_correction_bias",
-      "lm_head"
-    ]
-  },
-  "rms_norm_eps": 1e-06,
-  "rope_theta": 5000000,
-  "rotary_dim": 64,
-  "scoring_func": "sigmoid",
-  "shared_intermediate_size": 0,
-  "tie_word_embeddings": false,
-  "transformers_version": "4.46.1",
-  "use_cache": true,
-  "use_mtp": true,
-  "use_qk_norm": true,
-  "use_routing_bias": true,
-  "vocab_size": 200064
-}
+﻿{
+  "model_name": "List-3.0-Ultra-Coder",
+  "architectures": [
+    "MiniMaxM2ForCausalLM"
+  ],
+  "attn_type_list": [
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_list_ultra.MiniMaxM2Config",
+    "AutoModelForCausalLM": "modeling_list_ultra.MiniMaxM2ForCausalLM"
+  },
+  "dtype": "bfloat16",
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 3072,
+  "intermediate_size": 1536,
+  "max_position_embeddings": 204800,
+  "model_type": "list_ultra_coder",
+  "mtp_transformer_layers": 1,
+  "num_attention_heads": 48,
+  "num_experts_per_tok": 8,
+  "num_hidden_layers": 62,
+  "num_key_value_heads": 8,
+  "num_local_experts": 256,
+  "num_mtp_modules": 3,
+  "qk_norm_type": "per_layer",
+  "quantization_config": {
+    "activation_scheme": "dynamic",
+    "fmt": "float8_e4m3fn",
+    "quant_method": "fp8",
+    "weight_block_size": [
+      128,
+      128
+    ],
+    "modules_to_not_convert": [
+      "gate",
+      "e_score_correction_bias",
+      "lm_head"
+    ]
+  },
+  "rms_norm_eps": 1e-06,
+  "rope_theta": 5000000,
+  "rotary_dim": 64,
+  "scoring_func": "sigmoid",
+  "shared_intermediate_size": 0,
+  "tie_word_embeddings": false,
+  "transformers_version": "4.46.1",
+  "use_cache": true,
+  "use_mtp": true,
+  "use_qk_norm": true,
+  "use_routing_bias": true,
+  "vocab_size": 200064,
+  "model_creator": "List Cloud"
+}
diff --git a/configuration_list_ultra.py b/configuration_list_ultra.py
new file mode 100644
index 0000000000000000000000000000000000000000..7fcd9861c389c8c8c437784de4f5f2adf4688747
--- /dev/null
+++ b/configuration_list_ultra.py
@@ -0,0 +1,200 @@
+#                🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
+#           This file was automatically generated from src/transformers/models/minimax_m2/modular_minimax_m2.py.
+#               Do NOT edit this file manually as any edits will be overwritten by the generation of
+#             the file from the modular. If any change should be done, please apply the change to the
+#                          modular_minimax_m2.py file directly. One of our CI enforces this.
+#                🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
+# coding=utf-8
+# Copyright 2025 the HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from transformers.configuration_utils import PretrainedConfig
+
+
+class MiniMaxM2Config(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a [`MiniMaxM2Model`]. It is used to instantiate an
+    MiniMaxM2 model according to the specified arguments, defining the model architecture. Instantiating a configuration
+    with the defaults will yield a similar configuration to that of the MiniMaxM2-7B-v0.1 or MiniMaxM2-7B-Instruct-v0.1.
+
+    [minimax_m2ai/MiniMaxM2-8x7B](https://huggingface.co/minimax_m2ai/MiniMaxM2-8x7B)
+    [minimax_m2ai/MiniMaxM2-7B-Instruct-v0.1](https://huggingface.co/minimax_m2ai/MiniMaxM2-7B-Instruct-v0.1)
+
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PretrainedConfig`] for more information.
+
+
+    Args:
+        vocab_size (`int`, *optional*, defaults to 32000):
+            Vocabulary size of the MiniMaxM2 model. Defines the number of different tokens that can be represented by the
+            `inputs_ids` passed when calling [`MiniMaxM2Model`]
+        hidden_size (`int`, *optional*, defaults to 4096):
+            Dimension of the hidden representations.
+        intermediate_size (`int`, *optional*, defaults to 14336):
+            Dimension of the MLP representations.
+        num_hidden_layers (`int`, *optional*, defaults to 32):
+            Number of hidden layers in the Transformer encoder.
+        num_attention_heads (`int`, *optional*, defaults to 32):
+            Number of attention heads for each attention layer in the Transformer encoder.
+        num_key_value_heads (`int`, *optional*, defaults to 8):
+            This is the number of key_value heads that should be used to implement Grouped Query Attention. If
+            `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
+            `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When
+            converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
+            by meanpooling all the original heads within that group. For more details, check out [this
+            paper](https://huggingface.co/papers/2305.13245). If it is not specified, will default to `8`.
+        head_dim (`int`, *optional*, defaults to `hidden_size // num_attention_heads`):
+            The attention head dimension.
+        hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
+            The non-linear activation function (function or string) in the decoder.
+        max_position_embeddings (`int`, *optional*, defaults to `4096*32`):
+            The maximum sequence length that this model might ever be used with. MiniMaxM2's sliding window attention
+            allows sequence of up to 4096*32 tokens.
+        initializer_range (`float`, *optional*, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        rms_norm_eps (`float`, *optional*, defaults to 1e-05):
+            The epsilon used by the rms normalization layers.
+        use_cache (`bool`, *optional*, defaults to `True`):
+            Whether or not the model should return the last key/values attentions (not used by all models). Only
+            relevant if `config.is_decoder=True`.
+        pad_token_id (`int`, *optional*):
+            The id of the padding token.
+        bos_token_id (`int`, *optional*, defaults to 1):
+            The id of the "beginning-of-sequence" token.
+        eos_token_id (`int`, *optional*, defaults to 2):
+            The id of the "end-of-sequence" token.
+        tie_word_embeddings (`bool`, *optional*, defaults to `False`):
+            Whether the model's input and output word embeddings should be tied.
+        rope_theta (`float`, *optional*, defaults to 1000000.0):
+            The base period of the RoPE embeddings.
+        sliding_window (`int`, *optional*):
+            Sliding window attention window size. If not specified, will default to `4096`.
+        attention_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout ratio for the attention probabilities.
+        num_experts_per_tok (`int`, *optional*, defaults to 2):
+            The number of experts to route per-token, can be also interpreted as the `top-k` routing
+            parameter
+        num_local_experts (`int`, *optional*, defaults to 8):
+            Number of experts per Sparse MLP layer.
+        output_router_logits (`bool`, *optional*, defaults to `False`):
+            Whether or not the router logits should be returned by the model. Enabling this will also
+            allow the model to output the auxiliary loss. See [here]() for more details
+        router_aux_loss_coef (`float`, *optional*, defaults to 0.001):
+            The aux loss factor for the total loss.
+        router_jitter_noise (`float`, *optional*, defaults to 0.0):
+            Amount of noise to add to the router.
+
+    ```python
+    >>> from transformers import MiniMaxM2Model, MiniMaxM2Config
+
+    >>> # Initializing a MiniMaxM2 7B style configuration
+    >>> configuration = MiniMaxM2Config()
+
+    >>> # Initializing a model from the MiniMaxM2 7B style configuration
+    >>> model = MiniMaxM2Model(configuration)
+
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""
+
+    model_type = "minimax_m2"
+    keys_to_ignore_at_inference = ["past_key_values"]
+    base_model_tp_plan = {
+        "layers.*.self_attn.q_proj": "colwise",
+        "layers.*.self_attn.k_proj": "colwise",
+        "layers.*.self_attn.v_proj": "colwise",
+        "layers.*.self_attn.o_proj": "rowwise",
+        "layers.*.block_sparse_moe.gate": "colwise_rep",  # we need to replicate here to correctly route experts
+        "layers.*.block_sparse_moe.experts.*.w1": "colwise",
+        "layers.*.block_sparse_moe.experts.*.w2": "rowwise",
+        "layers.*.block_sparse_moe.experts.*.w3": "colwise",
+    }
+    base_model_pp_plan = {
+        "embed_tokens": (["input_ids"], ["inputs_embeds"]),
+        "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
+        "norm": (["hidden_states"], ["hidden_states"]),
+    }
+
+    def __init__(
+        self,
+        vocab_size=32000,
+        hidden_size=4096,
+        intermediate_size=14336,
+        num_hidden_layers=32,
+        num_attention_heads=32,
+        num_key_value_heads=8,
+        head_dim=None,
+        hidden_act="silu",
+        max_position_embeddings=4096 * 32,
+        initializer_range=0.02,
+        rms_norm_eps=1e-5,
+        use_cache=True,
+        pad_token_id=None,
+        bos_token_id=1,
+        eos_token_id=2,
+        tie_word_embeddings=False,
+        rope_theta=1e6,
+        sliding_window=None,
+        attention_dropout=0.0,
+        num_experts_per_tok=2,
+        num_local_experts=8,
+        output_router_logits=False,
+        router_aux_loss_coef=0.001,
+        router_jitter_noise=0.0,
+        **kwargs,
+    ):
+        self.vocab_size = vocab_size
+        self.max_position_embeddings = max_position_embeddings
+        self.hidden_size = hidden_size
+        self.intermediate_size = intermediate_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.sliding_window = sliding_window
+
+        # for backward compatibility
+        if num_key_value_heads is None:
+            num_key_value_heads = num_attention_heads
+
+        self.num_key_value_heads = num_key_value_heads
+        self.hidden_act = hidden_act
+        self.initializer_range = initializer_range
+        self.rms_norm_eps = rms_norm_eps
+        self.use_cache = use_cache
+        self.rope_theta = rope_theta
+        self.attention_dropout = attention_dropout
+        self.head_dim = head_dim
+
+        self.num_experts_per_tok = num_experts_per_tok
+        self.num_local_experts = num_local_experts
+        self.output_router_logits = output_router_logits
+        self.router_aux_loss_coef = router_aux_loss_coef
+        self.router_jitter_noise = router_jitter_noise
+
+        self.use_qk_norm = kwargs.pop("use_qk_norm", False)
+        self.rotary_dim = kwargs.pop("rotary_dim", self.head_dim)
+        self.partial_rotary_factor = kwargs.pop("partial_rotary_factor", 1)
+        if self.head_dim is not None:
+            self.partial_rotary_factor = self.rotary_dim / self.head_dim
+
+        super().__init__(
+            pad_token_id=pad_token_id,
+            bos_token_id=bos_token_id,
+            eos_token_id=eos_token_id,
+            tie_word_embeddings=tie_word_embeddings,
+            **kwargs,
+        )
+
+
+__all__ = ["MiniMaxM2Config"]
diff --git a/generation_config.json b/generation_config.json
index 30b418a48e04bf5e6d584093aa23393614678619..fb0cb22a96d91853244601c72b288a98324ed355 100644
--- a/generation_config.json
+++ b/generation_config.json
@@ -1,9 +1,10 @@
-{
-  "bos_token_id": 200019,
-  "do_sample": true,
-  "eos_token_id": 200020,
-  "temperature": 1.0,
-  "top_p": 0.95,
-  "top_k": 40,
-  "transformers_version": "4.46.1"
-}
+{
+  "bos_token_id": 200019,
+  "do_sample": true,
+  "eos_token_id": 200020,
+  "temperature": 1.0,
+  "top_p": 0.95,
+  "top_k": 40,
+  "transformers_version": "4.46.1",
+  "model_creator": "List Cloud"
+}
\ No newline at end of file
diff --git a/model-00000-of-00130.safetensors b/model-00000-of-00130.safetensors
index 48cb02ebb6de52ff272e366888581cd494798380..495aaa1a357cff3279e4d7de33e9b0500b450e86 100644
--- a/model-00000-of-00130.safetensors
+++ b/model-00000-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9785f5a87c85710e38f4ca11f819f3d137ff84615af1bc0ba533b94681addf27
-size 3693062744
+oid sha256:d0c16afa264ac999106d7b80b160a97c316a70fabad3d428a9943eb7a35fca4a
+size 3693062760
diff --git a/model-00001-of-00130.safetensors b/model-00001-of-00130.safetensors
index 03d2e4f89519b916065223d5372b8cdd1b401064..70ce372ccce36fdff0eb11258babdfdfbff18b2b 100644
--- a/model-00001-of-00130.safetensors
+++ b/model-00001-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d2ed94efe077a4498b788706e059d82780deb54436a70a5a9664b716d6cdc83e
-size 1208321176
+oid sha256:fe3b7db35ada8ade9963f2242b42d9ab6c82906f302c039cef50358a779cb848
+size 1208321192
diff --git a/model-00002-of-00130.safetensors b/model-00002-of-00130.safetensors
index 9c604108dd0eeee1fba743f4a1a13bf7fdf47afa..3046ee94f686fc4e704093669a3a4175bbb3647c 100644
--- a/model-00002-of-00130.safetensors
+++ b/model-00002-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f0c1b97aff37136b5d89a9df22acf7109fa824ccef5f9ff4f763b7869dfc5650
-size 2463868936
+oid sha256:6591f23f0997c5a93ad3b1d07e1640057635b08f633a13a1e676785bac0831c1
+size 2463868952
diff --git a/model-00003-of-00130.safetensors b/model-00003-of-00130.safetensors
index 3f2bc7361251b0ce28d48539a0c161b782bf7bc5..d7b12b9359afb6ae015404620231a086bf7dc09b 100644
--- a/model-00003-of-00130.safetensors
+++ b/model-00003-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:93be479ff1b6912ff1a7e54f4c4a4e4d67124d1811df8e39d50b981b1b43d8e6
-size 1208321176
+oid sha256:cff032fb55721ec4f9838781cc99ff07ca197a6a8122a79abbca2c72a1bac476
+size 1208321192
diff --git a/model-00004-of-00130.safetensors b/model-00004-of-00130.safetensors
index 267f1e40ce2d3060705f737b790211cc5c0ea45c..3388d187b8e70b30e26e54f267c09e5d0f5bdfe3 100644
--- a/model-00004-of-00130.safetensors
+++ b/model-00004-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5d5bead700b8f82dd2a50cee205c37f5642020c414452869693da06df384a9eb
-size 2463868936
+oid sha256:47eb412198f9d20cd82a914763df09c7024f15bb364dc8c683c9dfab12242f14
+size 2463868952
diff --git a/model-00005-of-00130.safetensors b/model-00005-of-00130.safetensors
index f58637bf72761ad9248ad612d3738320ecf26c88..0163aa4a17c60bea23def992724c7373fa6cbb08 100644
--- a/model-00005-of-00130.safetensors
+++ b/model-00005-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:99444d6d83c614776397faa167dc908d48016414e0dd6edef57fd9c040e01d21
-size 1208321176
+oid sha256:29ee6cc2652523a1529efbe193b2916b8312d4c81ffe3bfa69a3d5462890a9cc
+size 1208321192
diff --git a/model-00006-of-00130.safetensors b/model-00006-of-00130.safetensors
index afc76b5a1a08a830e63138856c4c3f0b83459b29..d0d8c3be52f1a7d5a8d0aef6f3ead8b08dc7ca33 100644
--- a/model-00006-of-00130.safetensors
+++ b/model-00006-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:df42d1d91b84ed41f846775a274dbd382185fdf7595009dcd016bd805e25eb1b
-size 2463868936
+oid sha256:a73d0f05cd4be0fc95fbd5b0ed43ed89b8b5310f0d77528d5b2f2636b049c15a
+size 2463868952
diff --git a/model-00007-of-00130.safetensors b/model-00007-of-00130.safetensors
index c3de034d0055d8d7efd95816004e1c5d6afea62c..c45b6cbe311cedfa4f09bda22385271663fa99f4 100644
--- a/model-00007-of-00130.safetensors
+++ b/model-00007-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:18882ffcb4f2dddfe6b8766393c68208b524aa4520ed921234a66b11548440eb
-size 1208321176
+oid sha256:d844a3f7afec3e0fe03111c45e01c434a4ae20c1d73a3004fcd688bda605ebef
+size 1208321192
diff --git a/model-00008-of-00130.safetensors b/model-00008-of-00130.safetensors
index c1f0529c61d6b5358aac2e6021e2403b10997cc9..086215d2a5048a92192a27f5b9689b36c2176284 100644
--- a/model-00008-of-00130.safetensors
+++ b/model-00008-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cf8ead5d7b01543a3fafc5a39240b1a3d9fe1cf25b360eb99e7a751359db9705
-size 2463868936
+oid sha256:c76e793b4cfdf48f057594fddc66a767e918f3ba261cc8c27d5206fcbc3790b7
+size 2463868952
diff --git a/model-00009-of-00130.safetensors b/model-00009-of-00130.safetensors
index daca91019e09cf08247ede546715096cc662a4f8..4bfd23c9fa56e951b6c60dbafbfeb46ba3da6c29 100644
--- a/model-00009-of-00130.safetensors
+++ b/model-00009-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d897820ce912aa7ae2feb4377d9b8684eca38c18be550b6bcf7316cb9d7c6e30
-size 1208321176
+oid sha256:641beb2755a121a3160b4d7a504b6d15f3d9521d9ad18178515b6833e02507a8
+size 1208321192
diff --git a/model-00010-of-00130.safetensors b/model-00010-of-00130.safetensors
index ebdb82d6ac5098a1471cb5362a0fc2726c5c4ad5..ef312a577a842974b9c75c4dfd8fb48dcd2c20d5 100644
--- a/model-00010-of-00130.safetensors
+++ b/model-00010-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:734eee6e62863c518a976d41b6c4122ed974cf87e52cd2d7e7df0187a3141b87
-size 2463868936
+oid sha256:acc219978e83281e8c819f646c189d6b1a4d018269194ad564ecf68a2fd2fd6a
+size 2463868952
diff --git a/model-00011-of-00130.safetensors b/model-00011-of-00130.safetensors
index 202b8fc1c9acb58782f90dff67fda9343739e723..ef4b0fd56f21ca4d44c7dd6b9bb5b18e17b4767c 100644
--- a/model-00011-of-00130.safetensors
+++ b/model-00011-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1237cbe1b9915bfda1efb8ced7d5a4266a0083a3b4c3fa401c4a003e3fea20fd
-size 1208321176
+oid sha256:71053f6d6db3f5d5c4ac3231963bf72fa31f431260c82fec8204518c046a8b7e
+size 1208321192
diff --git a/model-00012-of-00130.safetensors b/model-00012-of-00130.safetensors
index f689858ce1cdfd76de3a0e143bbe46658b125e94..45677a302cb954b35ec6af7f10e14b566adfb9a7 100644
--- a/model-00012-of-00130.safetensors
+++ b/model-00012-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:069b272af35289d3c499e98f867b1ffecb1f96980c583bf77b1d4d23c8b7a713
-size 2463868936
+oid sha256:22836d173404306e62d081a63ea3c04fc8ef408cc846bbe2d0a11f8d4fbb5026
+size 2463868952
diff --git a/model-00013-of-00130.safetensors b/model-00013-of-00130.safetensors
index 079c54bd0bb87e27f58cd313c1a95961130ea259..ec2ad926591cc55bf71fb4b4a9de656e9cc8d08e 100644
--- a/model-00013-of-00130.safetensors
+++ b/model-00013-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:045403b45c8951c3ea3c68b288f04255e0e2fc4de47293f9b941964212b8253e
-size 1208321176
+oid sha256:d1b4189b66df90cdc1e63a3ca6428abcf613f42d6ac7d8c2e3fd8a8cdf645124
+size 1208321192
diff --git a/model-00014-of-00130.safetensors b/model-00014-of-00130.safetensors
index 07f29eb2d810683a6b12d1d86a5ceb8b19582059..4c24ac349db965dbe973a8f2aa8bb6c9cc61dee3 100644
--- a/model-00014-of-00130.safetensors
+++ b/model-00014-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0277da3d1063a00618b32992617a2448c95c850c1f26dc4024d70ae920a35a25
-size 2463868936
+oid sha256:7598790d1aa068a5c9ba53fcc40c079394799a97306827f1ba1f8cba88684ab9
+size 2463868952
diff --git a/model-00015-of-00130.safetensors b/model-00015-of-00130.safetensors
index 68b71ada95537ac9bc00c3adb0e207ef56afc2f9..2aedf41980337097a38b7f0df947f69e4a1c6c5a 100644
--- a/model-00015-of-00130.safetensors
+++ b/model-00015-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d2a9db97dbab9f2a324219d4ba019656b6b635fae3b868d7f2a4fd6e3bab5e66
-size 1208321176
+oid sha256:18068f6619316e15eaa5899bc905d73829c198c95bd73e60ff9a916d06227c8f
+size 1208321192
diff --git a/model-00016-of-00130.safetensors b/model-00016-of-00130.safetensors
index 0305bea8f22d4759779cbc355dc857246ff7c710..a38b257e4d85e0881a353dd7bd65dba716559f25 100644
--- a/model-00016-of-00130.safetensors
+++ b/model-00016-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:90776eaf143864ecb632c059fefd4167e27c5644ba4eb50d65afa5291cff666e
-size 2463868936
+oid sha256:51251cb05597e91f3123a4895b103b700f5500292e0645d9dd5098d89905cdc6
+size 2463868952
diff --git a/model-00017-of-00130.safetensors b/model-00017-of-00130.safetensors
index 18443a8e3d85852383f2257bf30428636df1ceee..60a08dd6a2ba950aa7b1fb3069b4923c7c4a288d 100644
--- a/model-00017-of-00130.safetensors
+++ b/model-00017-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4ea50b70dae5f8b55b1990a6b6cad9291349b45162548e9d48d63b2a144e3c23
-size 1208321176
+oid sha256:6fbfbaa652a008a347622f73eb65c328519479d39984d20fe7550aa223731776
+size 1208321192
diff --git a/model-00018-of-00130.safetensors b/model-00018-of-00130.safetensors
index 04f879f9080a66065768e8e54cba3881044a8ec0..294dd061e09d7917006187ae4baf5e1cdad47ce9 100644
--- a/model-00018-of-00130.safetensors
+++ b/model-00018-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2a239e9eae27174937d5547d8e5e743e84bd7eaea50390510e4cd8f15511447b
-size 2463868936
+oid sha256:7aac1f32c20fd51a00f09337203defcce29e9f406bfb1b3ad6f149e1eb6ac5c9
+size 2463868952
diff --git a/model-00019-of-00130.safetensors b/model-00019-of-00130.safetensors
index c727ee4e0931ae34f888587f8f853b79d7e7c3cd..cc966920ba232c64f026a9a8d3ac7de6bd3f5b55 100644
--- a/model-00019-of-00130.safetensors
+++ b/model-00019-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5e041358d2ce0d92517b13508046baf08807d46adb33dda5d23728a4cef45f2b
-size 1208321176
+oid sha256:71137226bd4232c4b458fa03e452922938c2bbbef11ac6158872f1955a9051d9
+size 1208321192
diff --git a/model-00020-of-00130.safetensors b/model-00020-of-00130.safetensors
index 1291075d46679ed39420ef848b7c701e56aed52a..1f845d773fa00dea36af1a8d126b07ac016a1a28 100644
--- a/model-00020-of-00130.safetensors
+++ b/model-00020-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4f4f7af9ded3e7d5775012eae2c7dee63518c799ebbe42a47949aa7f560c5f43
-size 2463869968
+oid sha256:ee55ff6bcd2005fec670a2be80c07b08ce08cf4c5f8e60e475f69fdbc4124ac1
+size 2463869984
diff --git a/model-00021-of-00130.safetensors b/model-00021-of-00130.safetensors
index e070b98aa345b7b29c059f4c1cbbb706978495a3..5744d8b4ed06e3cb10e38e1ee23aa8902daee685 100644
--- a/model-00021-of-00130.safetensors
+++ b/model-00021-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8a76ddac05820e58676b3b56e2990c598dae551f1f65adf55a90a3754f66e2b4
-size 1208321688
+oid sha256:f689ebd29f939326b19c48f3ddb20c06f1f8f283dc3f945de7b3ad9a10c07a37
+size 1208321704
diff --git a/model-00022-of-00130.safetensors b/model-00022-of-00130.safetensors
index 00be2228b97bf32a593f9518c23f7b6470d3092e..0344f627ecc43c1323fb2a36d8121e631478f517 100644
--- a/model-00022-of-00130.safetensors
+++ b/model-00022-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c080ad8c3b5032434973e205a074e4d1a41edd399a383dc1c6d80ebb073ca09e
-size 2463869968
+oid sha256:9d25c1854e0b56c930560a8c3ad8e1e5476f40c88ba8e216304a01c5aca1bc19
+size 2463869984
diff --git a/model-00023-of-00130.safetensors b/model-00023-of-00130.safetensors
index 98dd3ee7d42ac5a7cf6c2eb34667df84544d0618..d34b0230861b1b49479e629263b9415414d71090 100644
--- a/model-00023-of-00130.safetensors
+++ b/model-00023-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9eee017222d3eb90afa5126fccb194de12c67828bd4353b3a466ce3da17877d2
-size 1208321688
+oid sha256:283726c528f252b7c37374757865124b80eccea270f296dac9cb39bdb29c30ae
+size 1208321704
diff --git a/model-00024-of-00130.safetensors b/model-00024-of-00130.safetensors
index b42f6bdb3f6e037942b67645d998daadf547f744..739390db832b92b93c11717c286918711c8cdc59 100644
--- a/model-00024-of-00130.safetensors
+++ b/model-00024-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e3d3c543000e2fd6180bb17c289f36e46256bf0c76f7ae98a7087eb4264db605
-size 2463869968
+oid sha256:0fc0e56e137378c34551c058d11163c6f70ec79980dc503c2e5f8ab8ca969a5d
+size 2463869984
diff --git a/model-00025-of-00130.safetensors b/model-00025-of-00130.safetensors
index 723dfe7a55f1b61817753bfcb12723c7084d8246..d26283cb5c40c55c8364b7b6bb422ecaedaac631 100644
--- a/model-00025-of-00130.safetensors
+++ b/model-00025-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:68580bdb4da65c22fb95a16e7fe13b1f0bbde861327d7c0bb6cb76a86794d38d
-size 1208321688
+oid sha256:ce447cd23d3ef6fbb2911e75b2eec4a500be913fab847ddd513b38faaab06ae4
+size 1208321704
diff --git a/model-00026-of-00130.safetensors b/model-00026-of-00130.safetensors
index ae245b35dbbd6780e5d860237e0624b59fc50197..be0524e5039642923ee8d02923196d78cf934f89 100644
--- a/model-00026-of-00130.safetensors
+++ b/model-00026-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c0ca69318b53d7ec6f7fcfa7981ed2ec402e73302fd5ea62ed77311f4eb8be73
-size 2463869968
+oid sha256:7ab66aaa211410818416eac84338b5231a55ccc62e93273af57ea54a7da38c57
+size 2463869984
diff --git a/model-00027-of-00130.safetensors b/model-00027-of-00130.safetensors
index 2e1a028e0d3c92553f226e6fd6a688934f024c4a..4898a05443931653ad9223a62f5cd5aa71854f58 100644
--- a/model-00027-of-00130.safetensors
+++ b/model-00027-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a6f03ff04b01299dceaf26fe0a0a503d6e0abc58eba94e8796e933e40bd10a5e
-size 1208321688
+oid sha256:db40c8e355ef79e34a8f1b1da001714d608016c18ea215dd02848a745d7b190e
+size 1208321704
diff --git a/model-00028-of-00130.safetensors b/model-00028-of-00130.safetensors
index 26ffc51e1763c45ba7c8bf8d82e8b0835ce4c3a6..becb21ae02cd8686ac35d5c41d3443cf72d7d5b0 100644
--- a/model-00028-of-00130.safetensors
+++ b/model-00028-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6432450282a2cd79475b57bf5b83380addf0b8d36586c750bc4fbf37ce04af6e
-size 2463869968
+oid sha256:cfa1a296fb0b36b616a2955e57af670e33bf8cb89171c63e6387b3bd6b381025
+size 2463869984
diff --git a/model-00029-of-00130.safetensors b/model-00029-of-00130.safetensors
index 4cbc7b1a120d349f3077da464eab4ae3f40453b9..e166ac02f59a65f5156ff0046fef5f4407634967 100644
--- a/model-00029-of-00130.safetensors
+++ b/model-00029-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:961ca8675f7ee7a1a65e5ea5f1e35dfe7427d566e68a1f56f04a463252763683
-size 1208321688
+oid sha256:2b85a8106a86e47f91e2221b043b4eab36c4ef76438d0298ad7c9d841ed8b0fa
+size 1208321704
diff --git a/model-00030-of-00130.safetensors b/model-00030-of-00130.safetensors
index c5d28f49b37207e307956a4eafec7e27d4c500a9..cf19b76d43467e33c73a1aadd243092398f61100 100644
--- a/model-00030-of-00130.safetensors
+++ b/model-00030-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7687ab86a251404b048268b022b67c148d38605ae04a0ddc46f2328aec60dc53
-size 2463869968
+oid sha256:02cd49378478900445f3295f028990061308abdec79e4d5df4b07a3dcb29a0f1
+size 2463869984
diff --git a/model-00031-of-00130.safetensors b/model-00031-of-00130.safetensors
index 936a7d35a4fbf4be1374069c5b2a76615422a780..6556c8ce270ee1e2e193c65c6ae6c9b79fb1f66c 100644
--- a/model-00031-of-00130.safetensors
+++ b/model-00031-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:345042a4520442dccd7428238a2d80a5b5b7d990d1d5b61395ffcaad7e4e8794
-size 1208321688
+oid sha256:ec5a215e0fc3048ea77ef02b4a5468ba94c159523d34b348f53396803d42c7ff
+size 1208321704
diff --git a/model-00032-of-00130.safetensors b/model-00032-of-00130.safetensors
index 64ece8e8ead362474263d993d2cf0bff7fee51cf..069b617f8cacbf390978a943328316dcd87117bc 100644
--- a/model-00032-of-00130.safetensors
+++ b/model-00032-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4faa680a93c47b4624ba40e17b98c725c9704ebbb75644feeb8f8a42a9045a7d
-size 2463869968
+oid sha256:619ba8b01d74dd14a7b32d74474e0fda94a4fc1298678dc277716788a253f47d
+size 2463869984
diff --git a/model-00033-of-00130.safetensors b/model-00033-of-00130.safetensors
index 38573648b07b035a79cabc45978adaafc1804433..8fe624ee13a7f5ccbdfbe37440adaf57cbcdba6a 100644
--- a/model-00033-of-00130.safetensors
+++ b/model-00033-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fdfa10d9c8315dd4dd94d46955e03b012d56e8764db1089e1b2970d5139bb38e
-size 1208321688
+oid sha256:00df4ee5d99ca76c1528f0c05beddc36e7de54587a96058a98318c90391bd40d
+size 1208321704
diff --git a/model-00034-of-00130.safetensors b/model-00034-of-00130.safetensors
index 6ad39db645bced255028db530702509fb4bdedee..00fb18e777ce7e3f01064c7ebf3cfcc6ea5a1de6 100644
--- a/model-00034-of-00130.safetensors
+++ b/model-00034-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ae23de77bccd17a8ec9286fcf71aa2ed2dfe54f3404f6ed755f5067c4d01149a
-size 2463869968
+oid sha256:1db20eca10db4d8a09052bb07c3879784b4eefb2cfbc068f9f92ce83f7835e12
+size 2463869984
diff --git a/model-00035-of-00130.safetensors b/model-00035-of-00130.safetensors
index ca8f4a80a7f05e0cb11a95373735cb84553c7805..416fe5e3e58127ec87cc9ecdadee7eaeb219514a 100644
--- a/model-00035-of-00130.safetensors
+++ b/model-00035-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6a5ca9a1fd87ba6f98d95f6a88789edf6909270540f0dd8736e05dd9f839943a
-size 1208321688
+oid sha256:f470d1acd3e6cccc93991ff168563c5b0150c9e97534ee1c7eb8b410086594a2
+size 1208321704
diff --git a/model-00036-of-00130.safetensors b/model-00036-of-00130.safetensors
index 9682d43c045e5f8ec55d9476e0f02221c1e3bbbc..52b472c1fe73c5f7458303e8575ab68d0833c909 100644
--- a/model-00036-of-00130.safetensors
+++ b/model-00036-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:88113822767ba632f6a9b1863c6d78c005107ef563d82f7948ed0a3e5b5d76be
-size 2463869968
+oid sha256:c05191aca5c7832a2ad70efb76c6053996373a972f944010702c1d89c0615808
+size 2463869984
diff --git a/model-00037-of-00130.safetensors b/model-00037-of-00130.safetensors
index 82f91a3c71dcc391d2b90ac5cce09cfcee60c797..63d13ad70acd187b0a302a48a20faf74b2af66a2 100644
--- a/model-00037-of-00130.safetensors
+++ b/model-00037-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3a42e3dfe02d8f2b8b2bfc8d35942e93de8746f74f88390f66d2106d6d7ee328
-size 1208321688
+oid sha256:e5f63e133ddd050c482fe97b9a43c3acb4b71ff9299250061a80ce9aedd54ef7
+size 1208321704
diff --git a/model-00038-of-00130.safetensors b/model-00038-of-00130.safetensors
index 9c46d54ecbecea82501d1709d7d73c29e481115f..61ca2b769a61876c015193f1f039a4ba0befb4a2 100644
--- a/model-00038-of-00130.safetensors
+++ b/model-00038-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6cf2b3485504e8b3790424afc1af0eaa735fa835999e5ac3639a0a0a1d1200c9
-size 2463869968
+oid sha256:7b8225555f566cc75813df75f0b06f28c5ff1a17113e863ae2dc5904bb0e0b7d
+size 2463869984
diff --git a/model-00039-of-00130.safetensors b/model-00039-of-00130.safetensors
index 030775cb3e49fe39e5d29c9d3ad10023ee38177a..e2cdc9d3da3af0e3c22db7ec83cc3ed85405772f 100644
--- a/model-00039-of-00130.safetensors
+++ b/model-00039-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bbf5e9eff7646b206eb25ba1a744d6d2e3544b3713638692a5869f8ef7143680
-size 1208321688
+oid sha256:924d61a64bc0252c8a116af17e04fb0456b9073f69f770bf7641d53459d626a7
+size 1208321704
diff --git a/model-00040-of-00130.safetensors b/model-00040-of-00130.safetensors
index 0e25b626dc1bf62f46524401faf3c2c9e4b3502b..3392a258254253e0543f247c4863bed3aec10f6b 100644
--- a/model-00040-of-00130.safetensors
+++ b/model-00040-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:499c9039dff0d6fa4c127030bde7cb7557bbd6cf98f7c002093e54bf16a0db22
-size 2463869968
+oid sha256:c702ab514fa24d0793b4cd2eba3e3ce00364031d230ff015b69435bcefd2fe98
+size 2463869984
diff --git a/model-00041-of-00130.safetensors b/model-00041-of-00130.safetensors
index bdd58bcd7a340193c18ffba6539601fe09176462..4784f3b84b106237325b5a8089996e661265cc01 100644
--- a/model-00041-of-00130.safetensors
+++ b/model-00041-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3ed0565052bb46b1b3913041d17da44b88c18ab5421ec770c2716762bf23aa8a
-size 1208321688
+oid sha256:8187a1702e6f97158ce33d917813bed2c09da5d254c23c3f9252212822122801
+size 1208321704
diff --git a/model-00042-of-00130.safetensors b/model-00042-of-00130.safetensors
index 0c7590501f6579bd38f572d48a7bf22fe687c265..1e35a3d875057cb5f52fae526ae64c024e829259 100644
--- a/model-00042-of-00130.safetensors
+++ b/model-00042-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:601959ff7bdb6fa3a0b08f529b592d23462083e30c4840b9925f655bde56649a
-size 2463869968
+oid sha256:086952771ffb3c230f442bf74089630ce154a7031ff55a096a329eda9fa5da76
+size 2463869984
diff --git a/model-00043-of-00130.safetensors b/model-00043-of-00130.safetensors
index ba25206991e2999a800cdc12c505e28693f477d0..8d9d8d73c815b66087d03529d553af356fee8b3c 100644
--- a/model-00043-of-00130.safetensors
+++ b/model-00043-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7fbd3484ee80a51f026b5feead3b59be11d8c4fc02965c58b123bd0111ff18b8
-size 1208321688
+oid sha256:f2007a0ad756d4f2e26a9563c44c0e3bba9eb37d54f39c6c74b7aeae7518b1a1
+size 1208321704
diff --git a/model-00044-of-00130.safetensors b/model-00044-of-00130.safetensors
index edc33e3a2b1d80beeefd9870a2795f6bdb24f541..6d9d926de0d9f3c35f1e12bbed441dba053ca169 100644
--- a/model-00044-of-00130.safetensors
+++ b/model-00044-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b349ca4c4779f858f89c6a50f0cd365d147df4b88a523752ea8f8f4221e42f81
-size 2463869968
+oid sha256:bccf19ea9a96545a27081444a93f797b3114001f3837522b622a03730e821916
+size 2463869984
diff --git a/model-00045-of-00130.safetensors b/model-00045-of-00130.safetensors
index e83abc6c363784914c7459d9709c964930ccb69d..45aaab663d2c00e7faa01f7d65cebc20dda933dc 100644
--- a/model-00045-of-00130.safetensors
+++ b/model-00045-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:54673ecdf05ea6b01934af72c258b05fd6c6018d0cd2d9acec530116d16285db
-size 1208321688
+oid sha256:1d303939832d74b199d4593622da9f8edc22acc2d9d0d45c52479c2529a73000
+size 1208321704
diff --git a/model-00046-of-00130.safetensors b/model-00046-of-00130.safetensors
index 887f19248240c36e92834c0d6481adbd1e6da5f9..b5791282136437a5f87e37098e1c4f2d8839d3b6 100644
--- a/model-00046-of-00130.safetensors
+++ b/model-00046-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:341ac0c20e20e3559be3aadc790c706b983e748a7832621f56659348d031aa49
-size 2463869968
+oid sha256:8fa2f23b6d23a8cd59d7537e70e99dba0bcf4a460159ea2239c8da03cdb4b355
+size 2463869984
diff --git a/model-00047-of-00130.safetensors b/model-00047-of-00130.safetensors
index 5bf13b20d94a1ff79489d4cdf9756d9acc948664..92bdf647ded33d191fc717a43bbe308fd0983078 100644
--- a/model-00047-of-00130.safetensors
+++ b/model-00047-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:38785114c81c6545b8ddefde004e154bd75a0095de6d1f59cb8e5b36d209d069
-size 1208321688
+oid sha256:4bb44e3a00a144df08f6cb7f486af9aaaebd2d6b1d14d1f0af2bb2c2d6ac257a
+size 1208321704
diff --git a/model-00048-of-00130.safetensors b/model-00048-of-00130.safetensors
index e79a6922bc89ad6e8c019219f6a51a164127f725..01f645ed7cc656caa93e629310ab4a3973009937 100644
--- a/model-00048-of-00130.safetensors
+++ b/model-00048-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:59c01cf8b22f7fd42acd0c8302f3a8c1d657491d0940a33c7aa8ec4c98190dc4
-size 2463869968
+oid sha256:77d90b8ffebccfb85d4a331bf42defc113daf21852998534bbdb0cbb365cdd67
+size 2463869984
diff --git a/model-00049-of-00130.safetensors b/model-00049-of-00130.safetensors
index eca69cc2168af32f419ef45490fc92bf54abe3a7..03e1dc194c82d99ae2cf44b8a0afe2136d7d77d1 100644
--- a/model-00049-of-00130.safetensors
+++ b/model-00049-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bbc2141546a281debcfa24080b2851d3f79b9123da5ba552adbf6e9d888b8d14
-size 1208321688
+oid sha256:8b5f293f072a8cc158c6afa7890c7d29c06dc8d69370634e852e7c577318c8ed
+size 1208321704
diff --git a/model-00050-of-00130.safetensors b/model-00050-of-00130.safetensors
index 02351321b435bdd7e21ffd959c6ff1f67bde5bf4..c3878403a11ced397754fc2c0165bc1b46b65cb2 100644
--- a/model-00050-of-00130.safetensors
+++ b/model-00050-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e6c1dfceca0259ac2d38bff5fdc0e98bebc964c69b2624724e371e7e42c7be09
-size 2463869968
+oid sha256:56465fcf91b6f750b78ad82f64cec306416fdda16a35a4cf1ab98cd8040a2dea
+size 2463869984
diff --git a/model-00051-of-00130.safetensors b/model-00051-of-00130.safetensors
index cab49fe089c4c4bbf93c35ac0eaccb42ec9c9d8a..309a48e11afac28dce853d32a718119faa265aae 100644
--- a/model-00051-of-00130.safetensors
+++ b/model-00051-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bc4209a8554b3d344e2afe9aefbcc7cd192b480b496d215b9026d0d966f5fb90
-size 1208321688
+oid sha256:013f0a79ef1e565dd47c7956eab6d534141234fac65832d52864849e313cc2bf
+size 1208321704
diff --git a/model-00052-of-00130.safetensors b/model-00052-of-00130.safetensors
index 93e064a911489f00f071c28cfcab3b4d7ac57549..5182b460d2f8ba0cb1b366fe810ad8721f530559 100644
--- a/model-00052-of-00130.safetensors
+++ b/model-00052-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e1ad313b24dccbdbef60fac452a080233f1b87eaa56d8a875c7c0c5f5272c5b8
-size 2463869968
+oid sha256:3d35192b4238e0b1bb40fdfffa87e98215677caedf3c77b4a3e00a1f5907c16d
+size 2463869984
diff --git a/model-00053-of-00130.safetensors b/model-00053-of-00130.safetensors
index 8acc452ac24c10d7868c4ec4812733ac3aec1530..6c94f8034466b02a2572004c662327525d225785 100644
--- a/model-00053-of-00130.safetensors
+++ b/model-00053-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:84f5bb1d8a740b89b24b59fd6d607d198099e480cf67e52dc2c8b49deb9b3fdf
-size 1208321688
+oid sha256:bd9e2290acd77a17c415124af372799398cec5335c67034eea48ffdcca64bbc3
+size 1208321704
diff --git a/model-00054-of-00130.safetensors b/model-00054-of-00130.safetensors
index 4b8c8c3b45fdae686d4bdfedda940b9be3cac702..c2f03880de6d3caa3db954438b6e83972b29d798 100644
--- a/model-00054-of-00130.safetensors
+++ b/model-00054-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a001ec5d2dd12f6a87c558766b0fc24aee042775a6806d37da459cf3e838e579
-size 2463869968
+oid sha256:f5f7720f95bf51c58cd2954a0eb41755bc165dfd723fe6f8eb688f6b14e910e7
+size 2463869984
diff --git a/model-00055-of-00130.safetensors b/model-00055-of-00130.safetensors
index 5acdf2d90da13e63234300e94b04ccd314ca694c..71a171a2ff32758ee410aaaf0d08749fc776e5e9 100644
--- a/model-00055-of-00130.safetensors
+++ b/model-00055-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:da2a90dda71ac298bcda0d6ef83dc28a129fe66ecefe27a064d3637c4f3f723d
-size 1208321688
+oid sha256:a575ca32b8b05436ec890f6e99111aba2dc8d4dcd2f4ba51e9933c93d7625bef
+size 1208321704
diff --git a/model-00056-of-00130.safetensors b/model-00056-of-00130.safetensors
index 17add31e2059eeb5342f1bf64cd64401a4fd1960..84d87c9fc00d92e912923db7a0b1dc802c617ed7 100644
--- a/model-00056-of-00130.safetensors
+++ b/model-00056-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9fe32d8911b7fb9857170ee26b9f330b1674e2c1f78cb0ef749cce9d6ec06c0a
-size 2463869968
+oid sha256:1bcbe082d00e1a7f9a2a3601f885cb03de48c146c16720f7a24da27000c52bcc
+size 2463869984
diff --git a/model-00057-of-00130.safetensors b/model-00057-of-00130.safetensors
index 7cead9dbaad4aab8e67007f5bfd690f79df6267c..0fcbbf57b2a94845ab5943fe7436407d6ffcfa10 100644
--- a/model-00057-of-00130.safetensors
+++ b/model-00057-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1e8d73847187dc7d4da9a41ed3f5e7fd8f324d14eb107845188138b464299eb8
-size 1208321688
+oid sha256:1b71271c3c85735c62278080e022c3a7609b70b8649792d0962c03a2375bdddb
+size 1208321704
diff --git a/model-00058-of-00130.safetensors b/model-00058-of-00130.safetensors
index 93204ba0ad0131f364b3e693c91ed29e6aa42483..75a90877d634e0bd8bcd513e7c24c15d34970cc0 100644
--- a/model-00058-of-00130.safetensors
+++ b/model-00058-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:61ae96c272433d211c12be3ec81471dd21868f6b79e326023a5f687cb0edc77f
-size 2463869968
+oid sha256:c8dc50834d0c87cbafe7576ed3c6d6f5b24ba93f76afcd7f3d4663fa30e9bdb6
+size 2463869984
diff --git a/model-00059-of-00130.safetensors b/model-00059-of-00130.safetensors
index 397d8ba29ae12eea475e39cbbdf653a9c4d3491f..1c076d9e483729d9b286efdbeeb47f2dce7590fd 100644
--- a/model-00059-of-00130.safetensors
+++ b/model-00059-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2f19cb6bc24a9937faffa46939c209f5ef790825e964cb6a2b86ab56719bfe2b
-size 1208321688
+oid sha256:740568378867dfc6c6ad03b2b9f3fb94278ad17db0402d9517638e58d2119ef2
+size 1208321704
diff --git a/model-00060-of-00130.safetensors b/model-00060-of-00130.safetensors
index 09efbd5b9dba8a7b42d4096cc673bd12fa200160..0d4b30ece26d710ec6a028ae4c7dc26c0f897e92 100644
--- a/model-00060-of-00130.safetensors
+++ b/model-00060-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3b43a6164a7654e0820410d279cee374ac3b64266dd95fca228417156ff93f2f
-size 2463869968
+oid sha256:b6d194c823c68c8f1c35df8aff3e5cf1d0a794d4ff83bbbe4402f88e674466df
+size 2463869984
diff --git a/model-00061-of-00130.safetensors b/model-00061-of-00130.safetensors
index 3b550897b04744fd99f6f2bceee2e83eb8a1617e..b3d8122ace8ef277dec4ec32ec238a725ee49994 100644
--- a/model-00061-of-00130.safetensors
+++ b/model-00061-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0ed80e6f71a57a8d74ffcb046d39c836441efb2d2bbe542299550b929a2d6ceb
-size 1208321688
+oid sha256:37073ca7f0d5286e7f0f2b444d9da166a41daa50fbfedff413d90b6ab194ee90
+size 1208321704
diff --git a/model-00062-of-00130.safetensors b/model-00062-of-00130.safetensors
index b536def12714131ae651f0405f2dffafa28af95f..4a63db751b00809bcd48e81a1ff3cb8b334b54c3 100644
--- a/model-00062-of-00130.safetensors
+++ b/model-00062-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b89ff1d3c45edc5d06652d2dbde36657f0c327e57e04558b0b0e46793857f4a4
-size 2463869968
+oid sha256:471276d00ebd1bf22bb32a4f02859f75e0d329fcf968858d086a1c71431b5ec0
+size 2463869984
diff --git a/model-00063-of-00130.safetensors b/model-00063-of-00130.safetensors
index 1774eeea772f5590288a21b2c6fd1b1fd178f528..0342e6ab8692a0308ecfb47c257e121d55e0d768 100644
--- a/model-00063-of-00130.safetensors
+++ b/model-00063-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8585c7cd94187eebfe4b64a25f13125add8dbc9932fee3a2af96cbc3e0cdbf9f
-size 1208321688
+oid sha256:ec4977ca868f31d64ffdae7b463ffd5456d1c391c1677d77b49e3a2684f53d3f
+size 1208321704
diff --git a/model-00064-of-00130.safetensors b/model-00064-of-00130.safetensors
index e644e98b21b177dbf954f88b60f07acd3089bad4..df6373fc64d48152c372129fdafab1697b0adc52 100644
--- a/model-00064-of-00130.safetensors
+++ b/model-00064-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:593a3e7a56cf130c7382de6a03d702be6ef279d887e7236d9b4fbd2bbd3d24ba
-size 2463869968
+oid sha256:307af83d7fc8becde1225b4b940cf0c078264241e9f2160bcd936ee7ee3eb513
+size 2463869984
diff --git a/model-00065-of-00130.safetensors b/model-00065-of-00130.safetensors
index f1873e036e3b49c3e4bacd7e7d3665e022f437bd..30ee0aaee0253da5c71d4c0c60d341d9ba445fdb 100644
--- a/model-00065-of-00130.safetensors
+++ b/model-00065-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:75849a0106d8bc2f1b20aef71eeb58cb3077c7e2951cf3e09788234def0c9927
-size 1208321688
+oid sha256:dab848ac603729e199e581d602cff9b746fb34afa0d3246749231591428aca7d
+size 1208321704
diff --git a/model-00066-of-00130.safetensors b/model-00066-of-00130.safetensors
index 565350f8285833e446755e316203834341e17745..bc702fc4359e61c2e186493511194f375e505112 100644
--- a/model-00066-of-00130.safetensors
+++ b/model-00066-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:00408d15935315da1a7bcbc23eee9aa4ee4563a4c14618b101dd33658960edf0
-size 2463869968
+oid sha256:d88e6be0b5ce61cc100aeb8488fcc52639f20e7168d8efdf510a3dca020de2fb
+size 2463869984
diff --git a/model-00067-of-00130.safetensors b/model-00067-of-00130.safetensors
index 4c63f2b39f0bb475fb92ed3debeeac6ed8c131b2..4da08c81e8b4d8ddd2691073c184d76a8af6a797 100644
--- a/model-00067-of-00130.safetensors
+++ b/model-00067-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4bface08c504ab1bf82e693c360accc76e49e579908e9b59dbd730ba9b8d756a
-size 1208321688
+oid sha256:d270184c361d815d80fc48ab6c6f83ee46768b3c4f1d4b27b0527c437e881bca
+size 1208321704
diff --git a/model-00068-of-00130.safetensors b/model-00068-of-00130.safetensors
index ab6b080bbdff2294fd7bffe8ec18934e11e83c39..afa43c470c5d73240eab9bb9b12d8ca51f6bff8d 100644
--- a/model-00068-of-00130.safetensors
+++ b/model-00068-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4a168b285a43f7ca03835b8c2ac472a5dfea4b01589a450040298a35d24092f8
-size 2463869968
+oid sha256:575b78cfd2bc412c7819764f13ac5bfe417eb34ca6663a5ae85254c716aec326
+size 2463869984
diff --git a/model-00069-of-00130.safetensors b/model-00069-of-00130.safetensors
index 1680e257bc0cd7ace6516d79b4ee2d9f5277db19..002371160aa73ea0a39f4a74504ad52f2d4bfa51 100644
--- a/model-00069-of-00130.safetensors
+++ b/model-00069-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:246239f37d0a7ac21cb105235861fbe48945361dbd5091d5cf1cffa5d5d24e14
-size 1208321688
+oid sha256:db4c3167a4096f7936b97f8f91694fa3350d7a003924957dff95c8184f7eddde
+size 1208321704
diff --git a/model-00070-of-00130.safetensors b/model-00070-of-00130.safetensors
index a72465815a066c26f413628d7fd70d132c6a93b6..65acacb5dd7fe7fe8b5268a46c5165a166e143a9 100644
--- a/model-00070-of-00130.safetensors
+++ b/model-00070-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:080f36a819d8014d93b3ff55ce5ca9e898322c721439f149505f7837ec8324be
-size 2463869968
+oid sha256:6b751cd22520a3901bcebff6cf1ac1c9361b69211b3f65e48e8a7f5ecbacae14
+size 2463869984
diff --git a/model-00071-of-00130.safetensors b/model-00071-of-00130.safetensors
index 7a877f95d536c4766375d1d811541473a6b6acbf..66fadf1581999b1c524d3c39898d8e675c004cca 100644
--- a/model-00071-of-00130.safetensors
+++ b/model-00071-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9e1a3b6d59ca4dcf99af877931f96cee754eb5019648f10b0fe01803c57a53b2
-size 1208321688
+oid sha256:8a13ff186cc4005f9347ba10f367f8b095cc6925e23c7d7cd8c287c3c8494cae
+size 1208321704
diff --git a/model-00072-of-00130.safetensors b/model-00072-of-00130.safetensors
index 2db83ed763e8602f2be2387aa53b1e7f036e82b8..86c19e001a5a9535128dd0cd5f2c51e909a331c8 100644
--- a/model-00072-of-00130.safetensors
+++ b/model-00072-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3702d9c9f31f088bc10d0b86c458fcf37245d066b6db9cc4d8e3b256e7c4be5e
-size 2463869968
+oid sha256:0217b335e2aeb9c5f3ce97a90786cb8fc4a719bee224d57760c5ee322f566b2c
+size 2463869984
diff --git a/model-00073-of-00130.safetensors b/model-00073-of-00130.safetensors
index b99689793d35a35e1bfcf3c7a86c2690e07c70d0..c2c7010e8a8082686de55aa94bf1bfa5fa9d59d7 100644
--- a/model-00073-of-00130.safetensors
+++ b/model-00073-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c71864e0febd666681bd413d2deaa82103227eaf4a77a42c00ca5b9f363c969d
-size 1208321688
+oid sha256:be3e34df5603e5c54543f8f3f1c0577439ea5d1da56d92aac284a79dfb1d5a10
+size 1208321704
diff --git a/model-00074-of-00130.safetensors b/model-00074-of-00130.safetensors
index f2d8e6d03fa3aefbbd35bc500c71db136bf1fe1e..7469b51f385aefe893dae9ab9423cbeafc306d7e 100644
--- a/model-00074-of-00130.safetensors
+++ b/model-00074-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:07e6d2b9d5cf7e361328896bd44f001c924cea3a3d139d31455a095d31f71e49
-size 2463869968
+oid sha256:26b5ca15031d1ad287d6b2eea514b758f33c5967e011fa3ee91c42878f5d28a5
+size 2463869984
diff --git a/model-00075-of-00130.safetensors b/model-00075-of-00130.safetensors
index 2ccca589034f1c4070e1b2608f3bcbca2f59b68d..7d7a2cb82d761e1f94709b6bbfaf9d5e7fd599de 100644
--- a/model-00075-of-00130.safetensors
+++ b/model-00075-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:674d2be3b866d45ea6d84c68fe2d7256167597fe19f016c5a5d89351c579d382
-size 1208321688
+oid sha256:3456597f9c157dca04a36e392ce7d6d90055a33f584aae355c3adc176f172fd8
+size 1208321704
diff --git a/model-00076-of-00130.safetensors b/model-00076-of-00130.safetensors
index 8d9e038f72eb31cee492eed3493e01b526d886a8..615f96ba3dc6623f33b87e98586c8b61c8043f87 100644
--- a/model-00076-of-00130.safetensors
+++ b/model-00076-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:160a131a07cbbe229190595ee4ac88a04c663a72ecdcdf316eb4d46e3654fcf2
-size 2463869968
+oid sha256:c60bb67369fabd9c63b32e8db14aaf23c017b6bbcaa004a950fbbc825fb91ec2
+size 2463869984
diff --git a/model-00077-of-00130.safetensors b/model-00077-of-00130.safetensors
index 263abd798b6dba35eb2bf613aa4a2fe93f3df560..cff06d9712f7682e4348817e2cedf8b350947507 100644
--- a/model-00077-of-00130.safetensors
+++ b/model-00077-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2a2a1eee70e8b1fc35d179fb05f83cb1d5f11765cf9b854425f2f973c379c26a
-size 1208321688
+oid sha256:26619e4beb5d05b3fe8c15f608fd4caa7ab7f2f6fc1ea53d9e8f0cc76f06db79
+size 1208321704
diff --git a/model-00078-of-00130.safetensors b/model-00078-of-00130.safetensors
index 2af9b063ef98b2dac796addd080a108129ffbbb1..16b21d920589fe6f143e4f311b7fd67a289ddfdf 100644
--- a/model-00078-of-00130.safetensors
+++ b/model-00078-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:799eaaf53b6fa6e4a367e56333f8496df3791e009471791ce21ab655b5f7e132
-size 2463869968
+oid sha256:f74cb7e92d96eb05f0bd712b2ad3417e62e62c1850171391ccd16ba89a194954
+size 2463869984
diff --git a/model-00079-of-00130.safetensors b/model-00079-of-00130.safetensors
index 4a7317528c9beba59cbb53ec1e9aa2048e1fd549..73c469282479454735011391a39d56230d34fd5c 100644
--- a/model-00079-of-00130.safetensors
+++ b/model-00079-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5bf243b4004996bdbf7119bb4f43b5d8159b2f70412715058cd964e88c1607e9
-size 1208321688
+oid sha256:b5467aee8bbdb0ec4ceb53c0bdfd5ae4f3cb4c1f11706c2b967eaae0ad55abae
+size 1208321704
diff --git a/model-00080-of-00130.safetensors b/model-00080-of-00130.safetensors
index 094b751a5e04ebc5d812f75f22d0bfd2f263afa5..eb3f860f1a71e5a917dd08730608f4d32a6e6b48 100644
--- a/model-00080-of-00130.safetensors
+++ b/model-00080-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9f97809043caa0d67ebf635c6f585cebba6264a50e5c160e5b600d4f23aacbf4
-size 2463869968
+oid sha256:9286309d1a4fcfbd073aa9f984f8268f6980c576e2b7d8e89eb56a01d1dbae85
+size 2463869984
diff --git a/model-00081-of-00130.safetensors b/model-00081-of-00130.safetensors
index 756367ce993afee9f2ca25f356796a52aed6d76f..9a93f77300a3f66af98be248c18ba61c53421f46 100644
--- a/model-00081-of-00130.safetensors
+++ b/model-00081-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8129bb648b2bd7d503df489b6260b0c902f892735bbb4d656f59e3d3a93e45b2
-size 1208321688
+oid sha256:20082af5e0887d614e89610fc53bfbe904be28091a2b81888b64c760e8581a7f
+size 1208321704
diff --git a/model-00082-of-00130.safetensors b/model-00082-of-00130.safetensors
index f08a9efcdc5436e36a7aa3fd293aadc870bf0846..f126e06e2134edf392ba3bdaa2033e9f9bc89617 100644
--- a/model-00082-of-00130.safetensors
+++ b/model-00082-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4a581d6de6af239880bbbb4cd875954edf0c95ad14b43fdd1094871386704dd5
-size 2463869968
+oid sha256:88fc35e7132aae27fde38421a2f845536b7e2561e826a64cfb1fa50724b8f648
+size 2463869984
diff --git a/model-00083-of-00130.safetensors b/model-00083-of-00130.safetensors
index 08a2e41a97eca2d3b8cd9ec28c1df4ca7076ba4b..169b415ec31fa238af249f11cf5fb96414ad5f0c 100644
--- a/model-00083-of-00130.safetensors
+++ b/model-00083-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:246f02a0e29120dcef28ea85a0eacd8d5a5722d0f0b165f61fef821f700f9d9a
-size 1208321688
+oid sha256:7e7b4d1f2e311d55f966342f75addcc725de48c0f5502902d01883bc870c7988
+size 1208321704
diff --git a/model-00084-of-00130.safetensors b/model-00084-of-00130.safetensors
index 151b1efeaa43ec88db479a561319d3be297b5df2..6f39dfa1667f94a66ccb05928271d07dc1214ba3 100644
--- a/model-00084-of-00130.safetensors
+++ b/model-00084-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4083ad0a522bc60a977253d091f496865f75f0be4d6ece2b975113a30007127a
-size 2463869968
+oid sha256:21f50eaeadaffa7c8ba11803f913a33a9326735f005048a83dfcb5bae8664991
+size 2463869984
diff --git a/model-00085-of-00130.safetensors b/model-00085-of-00130.safetensors
index 5a36e63bf7b49ed65ebf03c93cdec2af0e747bea..cc6ee902982843ee2bcd137ff9b4f052636e2d1f 100644
--- a/model-00085-of-00130.safetensors
+++ b/model-00085-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:35ea447ad683c811138d91696d8fda8008293a785518b7b86b1aa6c9ddc209b9
-size 1208321688
+oid sha256:6bc79d060033c27e1eef0ead16980e2ca552dd7ae32c3c4aeb2da11599aee4c4
+size 1208321704
diff --git a/model-00086-of-00130.safetensors b/model-00086-of-00130.safetensors
index 1485d5420b3d2ad51357b78dd6810569f86b125d..b6d0db16e3dd3ed6d32e0c2f0d9382a08fb7909c 100644
--- a/model-00086-of-00130.safetensors
+++ b/model-00086-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:257545b54e89ceed10803953ccc19db9f723916eae82f62293b244af9ff18773
-size 2463869968
+oid sha256:f906955440093248ccfad2994d3d4609d8925c27eae4aea2c9ab8fda6b21a2c0
+size 2463869984
diff --git a/model-00087-of-00130.safetensors b/model-00087-of-00130.safetensors
index 5f64b2e957c60995a5c342ad0ae5f674439290d4..ef3954a16c86ad14ea3c773c212d7da88bcd1889 100644
--- a/model-00087-of-00130.safetensors
+++ b/model-00087-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0f9a088db1323c4b7f2278201665b8d829cce886267b069659b88fbe3b38b0db
-size 1208321688
+oid sha256:4ba86c862516a7978c441305b48db43baebc5bea6e3af7d7779b617b0bc05088
+size 1208321704
diff --git a/model-00088-of-00130.safetensors b/model-00088-of-00130.safetensors
index 07952818ab6fbddbbab4516fbbd27a53e70b7834..78c3c4ecc482ed8116427955d9e0bfc3fde38757 100644
--- a/model-00088-of-00130.safetensors
+++ b/model-00088-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:55ec4e69a22dd99aaaf394a95d830a7deca496acba7870509d6e70b084bce6e8
-size 2463869968
+oid sha256:ca347c28b286dda5a691745bfba88f995441983e8c9791903baae7e467a8405d
+size 2463869984
diff --git a/model-00089-of-00130.safetensors b/model-00089-of-00130.safetensors
index 07d8a3a8e133029a086b3210ccfdf6de8091aaaf..806120dcd0e3b1a29c77d9f39981a7fa0170e78b 100644
--- a/model-00089-of-00130.safetensors
+++ b/model-00089-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e86e9b192f490993592b1c331726b32d3f9bdf80f2d6abe893d20cb70e51760a
-size 1208321688
+oid sha256:a0735a130b3ad68cc48c297a29b86dddbe828d9eb94c7530b3387b8c783444d7
+size 1208321704
diff --git a/model-00090-of-00130.safetensors b/model-00090-of-00130.safetensors
index a997f1abfe2c5600671cebb6bd0a79fee729c902..fd2e3dc9fa6de5326488d53ed08a2b44672a25ae 100644
--- a/model-00090-of-00130.safetensors
+++ b/model-00090-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b57125eec75a1b0cb31d3a8401d6a231359419e549e20072bcc39709423b129f
-size 2463869968
+oid sha256:925e556dbdf2afad4acb90fb199c609dab09cbc318ac058757592750cafbfaf8
+size 2463869984
diff --git a/model-00091-of-00130.safetensors b/model-00091-of-00130.safetensors
index 72978b3ed4923e95a2275ff4e86345c0a2893519..5ebeb21b86f4ebd9b44f01342624910dd40adba8 100644
--- a/model-00091-of-00130.safetensors
+++ b/model-00091-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0c613cdacd627e2fc3de08194efe1607aa06bdd386e1ccac1c7c133f4b5a2e8f
-size 1208321688
+oid sha256:bd424c50b1d72779a1726bb60232bb6cae97c26d878706c829e1484d65c85c7c
+size 1208321704
diff --git a/model-00092-of-00130.safetensors b/model-00092-of-00130.safetensors
index 70865c5f5d0e494c9374212f0e4b3c928e0e4813..31fd28f3b8d265cb26c93c73587c5653ed6e8a95 100644
--- a/model-00092-of-00130.safetensors
+++ b/model-00092-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a406fcc45a8a785e366d68ef9b222940d480c788a176ff26c74d7287051554e2
-size 2463869968
+oid sha256:56c2b84dc66d9b6e2b1ec46c579fd4bb6697606587926f75770fd50ab11b9f94
+size 2463869984
diff --git a/model-00093-of-00130.safetensors b/model-00093-of-00130.safetensors
index 67d49217407988dde62de78ac81510ab902d9bc3..44b420240a2528b8c17e7b7ede5b17126b5c2983 100644
--- a/model-00093-of-00130.safetensors
+++ b/model-00093-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:27f4f5084a432f77340599da368f6fbd7be38f07380a8ea87b39807a67198365
-size 1208321688
+oid sha256:e1c0568aa013b2712520a408290ccf1a54bef1bd4f4af8ee02d6029fb974efc2
+size 1208321704
diff --git a/model-00094-of-00130.safetensors b/model-00094-of-00130.safetensors
index ba2e7f032b89d7119e9480929ce09f8cb4fa39bf..9126679073a6c2b4380fa86465f3037222b9d123 100644
--- a/model-00094-of-00130.safetensors
+++ b/model-00094-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d6db2523f161c686a3ae2dbd7b09aac6a6f0b0d5304805876385ab7c4bc0b5c7
-size 2463869968
+oid sha256:5e57a6198c8ae05d3f6d2d701085f6c3c7053195fca9f0be3d4395e45f75e4b2
+size 2463869984
diff --git a/model-00095-of-00130.safetensors b/model-00095-of-00130.safetensors
index b7ae61c3ae8c22617653b214b6fab00b18bb778a..3194ff05c47aba4daa261d38f0733ec2e0473607 100644
--- a/model-00095-of-00130.safetensors
+++ b/model-00095-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a8480d9cc9216c650a30cd7168244b84aa6762c7835a92600ce198da2d15fbb1
-size 1208321688
+oid sha256:c35c3e96eaab7420f2c1f78f7784c6b077b6c4f158f2279f6f62e28b26c396eb
+size 1208321704
diff --git a/model-00096-of-00130.safetensors b/model-00096-of-00130.safetensors
index 0cf06537026d731d449cefa5155ad307d1b57647..3401e39e35415c79467b3b6d49fd95a7ab716907 100644
--- a/model-00096-of-00130.safetensors
+++ b/model-00096-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0123cfef652f44b2c6dbfcc47ede03762d4a572236367eee32a677d43d9a4dca
-size 2463869968
+oid sha256:d5986ad365b5c92b39c53cae7d8091250f86d800eb9f0b85f5c92e46b0023299
+size 2463869984
diff --git a/model-00097-of-00130.safetensors b/model-00097-of-00130.safetensors
index 22b3dc84b20f4deee20c8e326d5be9437b9b6484..4f6ec33a96ffd39764ce97c9765c077a3ec29ec8 100644
--- a/model-00097-of-00130.safetensors
+++ b/model-00097-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:181466337b86afbc94dfae30196ca15a27ff01b35c5cf3939682032c5c0469c3
-size 1208321688
+oid sha256:b21f35ffe104867870a986950155892fcac9affb7b0bc42680807c375f84dcb8
+size 1208321704
diff --git a/model-00098-of-00130.safetensors b/model-00098-of-00130.safetensors
index cf3a85dfa6b0b245395f80785f4d626becd1cfd6..f0198b1725bd71a8625a591612b4e40e2acfb88b 100644
--- a/model-00098-of-00130.safetensors
+++ b/model-00098-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cb371f55564ec7a0ceb55bdf314c56b61385acfd7d59422e6b3a7efc75dd125a
-size 2463869968
+oid sha256:8bb54c5eaf81beba692c4f618331d9c710c64fc6e0d3aa76f7495b37d555890e
+size 2463869984
diff --git a/model-00099-of-00130.safetensors b/model-00099-of-00130.safetensors
index 7613dc406dcd2cc80f63f103794ec120eff2f898..950d991a362e8f6a5ae56fc8f45a533f7527216a 100644
--- a/model-00099-of-00130.safetensors
+++ b/model-00099-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9f0f0bd9e07f7097693bfb58da9c73e35bf1e39eff80f0fba8f46ecde511cf63
-size 1208321688
+oid sha256:88c8ed89e176df2c57a2b541dd546f4860195ec55d89d5d242559a5e05b3923a
+size 1208321704
diff --git a/model-00100-of-00130.safetensors b/model-00100-of-00130.safetensors
index b4da88bffed506d100c3a6234632846751765676..80f24a84be2fdd0fc0231093d002b8d1d690d581 100644
--- a/model-00100-of-00130.safetensors
+++ b/model-00100-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:45fd433c26aab73e4a6b4d4566f5511c4376549df1ed9c4257493b1c72710fa9
-size 2463869968
+oid sha256:c462e7176c100ee61e19b9d983fd2fcd623765627082c889793e9d4f549f1ebd
+size 2463869984
diff --git a/model-00101-of-00130.safetensors b/model-00101-of-00130.safetensors
index 3923c3bed5990ee26deacd1824e399fbdcc42c4a..5462ae21fbc9c180aa464d03e065802c7b950433 100644
--- a/model-00101-of-00130.safetensors
+++ b/model-00101-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:82ed509a2950aacc0a217e61fd8ca43bf06cbd5c6fa734c33bb7e6baec4a85cb
-size 1208321688
+oid sha256:072e923c77c6d78e6c7e8e88f14a2942ce5b9923a1b4defcd2ad0eafeeed18fe
+size 1208321704
diff --git a/model-00102-of-00130.safetensors b/model-00102-of-00130.safetensors
index 07cad3d80a73117eee3fa7b81c7719ee58fa4e53..d4481dfcfb93fadbfa2463e85a33e337ca59b94d 100644
--- a/model-00102-of-00130.safetensors
+++ b/model-00102-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:af2c3743f4034f012b1855bca20bdfe2b081dd864a2bdc7064e9c1ea9a09f94c
-size 2463869968
+oid sha256:5aab62a54bb84e1471070b07068a5b6e0a98827e2d42486aa5d11904a49adff5
+size 2463869984
diff --git a/model-00103-of-00130.safetensors b/model-00103-of-00130.safetensors
index df5245f3a3f003a6d492a419c9a1e4a6ac62bac1..eb98a041a0dfac7ded2cee2f79471b3862ad48f8 100644
--- a/model-00103-of-00130.safetensors
+++ b/model-00103-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4ac162b3348bc1ed712146b4d2a3bf443250c2268bceaca15c8cdce38a7fca7c
-size 1208321688
+oid sha256:de918f153ee3f6930a6377b9be4570e17cf1b5e15e9649fe153271de2a77f2fa
+size 1208321704
diff --git a/model-00104-of-00130.safetensors b/model-00104-of-00130.safetensors
index ab77252be678a414d0ba73857b9e0ca5e3f8ad89..47685b3d063cfedff7a3a819993efad4f6590dca 100644
--- a/model-00104-of-00130.safetensors
+++ b/model-00104-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2f21dabd4f4214b13c4803104783e5a3ad5af9838bcc849d1606c0e1f096a946
-size 2463869968
+oid sha256:4d07ec351c1cc965dd4b1f0809f35cd3e75c6a12a2aba2302f1e186e038c6e42
+size 2463869984
diff --git a/model-00105-of-00130.safetensors b/model-00105-of-00130.safetensors
index 45210b1ff615fcdf4c5f7e85253d4c2605f645a3..023ab000e09c4422ce493746f6eb6dbdded7a53c 100644
--- a/model-00105-of-00130.safetensors
+++ b/model-00105-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:97dd9fc182eb0583291bd29226ef3cf41319fab78295a910470fae7ea49339ae
-size 1208321688
+oid sha256:d2eed881777af73df8d435f7ee40853ca0e96a5c49fe522ec8f1697043943421
+size 1208321704
diff --git a/model-00106-of-00130.safetensors b/model-00106-of-00130.safetensors
index 86d8c835bbc5b26c7df2a4d497f4a1ad11c69fe5..2d093f8e31bef671a2ee8fae6f735be87517de7e 100644
--- a/model-00106-of-00130.safetensors
+++ b/model-00106-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:55f519426d248d7c57a147a1b82d819900788e43a62b6972c2148586f10f05f0
-size 2463869968
+oid sha256:205cb5c7e241b11d58c2994212570ded889f94ac2d0589799650afc1ebd66197
+size 2463869984
diff --git a/model-00107-of-00130.safetensors b/model-00107-of-00130.safetensors
index a8acf560e26d6401d28a80b706344a56bc715d6b..71598507ba7253c0418afcf5f27a1696807f75f4 100644
--- a/model-00107-of-00130.safetensors
+++ b/model-00107-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b50498a9bf402bdb82bebf103685634c37334609f9efcecf54babe7f9b5baf65
-size 1208321688
+oid sha256:f81ccc9301fb0b0019f3bc8fc7c11b2ca947e6b184ed6f10394002159089b59b
+size 1208321704
diff --git a/model-00108-of-00130.safetensors b/model-00108-of-00130.safetensors
index cf261e862f2fffe47dbf9891f769a265a593ab9b..309fff956c02994e3f4cc6ed66465137b2d750a9 100644
--- a/model-00108-of-00130.safetensors
+++ b/model-00108-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:502bdd08025b8d357717bdd305200df326f5f8c0e7ec6f7ce2c82d115cdf7e75
-size 2463869968
+oid sha256:d0ad55be87bca5e9b773db48f76bfd66ede0a53057d1d787d7323142a9690f35
+size 2463869984
diff --git a/model-00109-of-00130.safetensors b/model-00109-of-00130.safetensors
index dcd65a71881b9186d4e3c5b0f80482ebc36c5793..dd750502ddecb6b81084f2ab8b993e206203abf7 100644
--- a/model-00109-of-00130.safetensors
+++ b/model-00109-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5a528b71b52c2211e6f91deb829d11cb22b655dd57dc251d84ba4fe521e47ba2
-size 1208321688
+oid sha256:dbce9cf061e37baa89ae57c1ade0ea4d605b4d19cf7a5d048a176248196102ee
+size 1208321704
diff --git a/model-00110-of-00130.safetensors b/model-00110-of-00130.safetensors
index 4ab388f0f8dcd7c3e3c3752565806610513ac6dd..15c7fcdb46fc3da93292bbb1c427e402435b3872 100644
--- a/model-00110-of-00130.safetensors
+++ b/model-00110-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:20d1fa5b16599eee4fa39118f73508b579190a374f70f6c1bf83018c60a9d7be
-size 2463869968
+oid sha256:412b58ef7c3ac38758ad75e14a9c4976ab032556a2a0b4924494d0eed2116653
+size 2463869984
diff --git a/model-00111-of-00130.safetensors b/model-00111-of-00130.safetensors
index 8b8d0aed910716e757e208d0538aa6c499c8f579..b179d68003cd4c002530a1a26b185051565e4a0a 100644
--- a/model-00111-of-00130.safetensors
+++ b/model-00111-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f63b6c84659c71d9d253bf5c22237c562d3a3fb44c70fd54cca9d7993c35ea04
-size 1208321688
+oid sha256:9b7a752507c7ec34b57abf1db86fb039291a73f5d3ec137b7cbd84793089fb85
+size 1208321704
diff --git a/model-00112-of-00130.safetensors b/model-00112-of-00130.safetensors
index 346df719ba428ce5a981cd8e5aaae2f28eb616c7..3a28a2ec90fde21795b161093e8958eefc45bd3c 100644
--- a/model-00112-of-00130.safetensors
+++ b/model-00112-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c4e0b5428019c75f894907107d85da010697f4ecc333b244c6cfb4aea0e3c440
-size 2463869968
+oid sha256:bf588bc965737e3ad9f27812955675d24df46f5ebd899840f481886884a3bfac
+size 2463869984
diff --git a/model-00113-of-00130.safetensors b/model-00113-of-00130.safetensors
index feb96566abdb29f11cba1e5df257366f670536c8..49190a539189de643662dfcdb8d2bf7ad22e0827 100644
--- a/model-00113-of-00130.safetensors
+++ b/model-00113-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e48bfe3f2a384aebf1038c14c651c69c64a8fac061e5b9547fb7d67da9ee5029
-size 1208321688
+oid sha256:06880550004cf06ee29a23adc8fe896368df9442dd7e44779a9b773423ffa396
+size 1208321704
diff --git a/model-00114-of-00130.safetensors b/model-00114-of-00130.safetensors
index be0a3b12b05b32d5a51512e9079af002eb95064c..ce863c137373fbaa76544489ec2a661d695417bd 100644
--- a/model-00114-of-00130.safetensors
+++ b/model-00114-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:08fb5a9fd03254204848af6413c7bf68876bee74f6bb37247d05dd2fc7480a84
-size 2463869968
+oid sha256:042131124955b2fd10e42f88d248cac356a1ae54f4f338ddf67b332fee82f1a4
+size 2463869984
diff --git a/model-00115-of-00130.safetensors b/model-00115-of-00130.safetensors
index 4854afadea7dfa58e5b57295d5c31a8e1bd0bbcd..33de24e38c9638bd6d43ca76e7d411a733879372 100644
--- a/model-00115-of-00130.safetensors
+++ b/model-00115-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:adf4ab941b453ba215787230e4a4f001623a5f06180deb3c5bed050160f463d7
-size 1208321688
+oid sha256:fd78f64fbc5d8fc9943ad03ac888d590351ac67367a4605f541567075c2a90a9
+size 1208321704
diff --git a/model-00116-of-00130.safetensors b/model-00116-of-00130.safetensors
index d5e956684a2110d4a775c7363b7acd63fc1608f0..a842a7a9f272737db0892fd51e66a5870f2a8fa7 100644
--- a/model-00116-of-00130.safetensors
+++ b/model-00116-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:45ba476d7a50d28db1380179ec3f2d3c274d35a362e2a6b680a6ab653aba88d1
-size 2463869968
+oid sha256:4857fbd1bb9738fbd98b8fe9700c055c8bf9c099874931a9e59a4f796f95a9c1
+size 2463869984
diff --git a/model-00117-of-00130.safetensors b/model-00117-of-00130.safetensors
index cda5818d88196aca09ceb373fc78bd171fac297e..ecd61eb2f1990b5e8013168cd4191dcccbae3c78 100644
--- a/model-00117-of-00130.safetensors
+++ b/model-00117-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7f2a7438c4f6c66ac95eaaaba65c1935bfcb917884e021c30e588c74ac189fc5
-size 1208321688
+oid sha256:becc0b4f32f7d0d4de8d124ec62bb95b5f57936e63f6bfe8874d59bfc1d7edc1
+size 1208321704
diff --git a/model-00118-of-00130.safetensors b/model-00118-of-00130.safetensors
index 3ec47c04235ef03580f4c9a02ec263085dcd3577..9f708b9c323544f7e661be4a2d2c48f26e79afd7 100644
--- a/model-00118-of-00130.safetensors
+++ b/model-00118-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a2eac3b06b70ff4f38c8166038b87e4e010e80fdb0c7fc32ff04b669b79bb390
-size 2463869968
+oid sha256:6b27be267068a02a5a33ec666d64ffb3548d05c1091f9e85b2aa227c643cf3a0
+size 2463869984
diff --git a/model-00119-of-00130.safetensors b/model-00119-of-00130.safetensors
index 9eef34827a70af39dc3bc0b28f2eff32c1e22854..77774f46d3c3e4fed387010fcea7d8c8957b874e 100644
--- a/model-00119-of-00130.safetensors
+++ b/model-00119-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5cf0b16b764dd9303984a467fe3ad8a04b2b3908e230fc902425ec8746df804e
-size 1208321688
+oid sha256:666681aaed291bc9298bb4a688b2c801dc3bb2fc796d51f07f5d5a72797b8658
+size 1208321704
diff --git a/model-00120-of-00130.safetensors b/model-00120-of-00130.safetensors
index 4779b42c8d5104e8c41e536601c5e326063b3bad..d6756dfbe1a91e0d349ae09085564c19b5f636c8 100644
--- a/model-00120-of-00130.safetensors
+++ b/model-00120-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:16a1f3697a6913aecb34e5d880c42a38d067a5172d52eb44f4fb1de914fa879b
-size 2463869968
+oid sha256:9b0d00dbb435dea822d9afa5feca103b96f4ed36bca8eb4f1820b8702421e816
+size 2463869984
diff --git a/model-00121-of-00130.safetensors b/model-00121-of-00130.safetensors
index 61b53e232155d41c21581e9c10998f54c2f4ac00..fdb87c910b52d9d095bdaf91fcab1b3280d43a00 100644
--- a/model-00121-of-00130.safetensors
+++ b/model-00121-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b2eabbce05904ab80f1919c0e74052810493c344eaa120dfc2b1bf46e195b230
-size 1208321688
+oid sha256:dbddd32ac1c6d80b443380a880ef4c435f10708eaae864cf745cbf76981cbf5b
+size 1208321704
diff --git a/model-00122-of-00130.safetensors b/model-00122-of-00130.safetensors
index 3a24937431600cd750f4e73ebc37e6279faf63eb..08bdc134314965dcec948e8f4aa57028ff7a080d 100644
--- a/model-00122-of-00130.safetensors
+++ b/model-00122-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2d4060a5e532922a3d5dae24262c08c21acd1a029e06650f806f9f3a111bcbfb
-size 2463869968
+oid sha256:ac4d80cc6e5c9a20c7ab4a0010f14f1a313f195b031aefb84d83ac6c607cb102
+size 2463869984
diff --git a/model-00123-of-00130.safetensors b/model-00123-of-00130.safetensors
index 2c3a3861da43e43da4924538a6ee77d1db0b38ed..ff32f4ccb6fd81b6c4194c3eb5fcb40e9b68c3d8 100644
--- a/model-00123-of-00130.safetensors
+++ b/model-00123-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2779fd92da6eb6c42edaf3b1e9cdcc5b7a501b5c9a25cfb3c210baf0f42d837a
-size 1208321688
+oid sha256:3776600e4fea8e0d7b3c4c2667b0cdae07d4f5a7e7b30ce913b0c30c3a8ea0d8
+size 1208321704
diff --git a/model-00124-of-00130.safetensors b/model-00124-of-00130.safetensors
index cf1adeeb58c0df852b1212f26d17cf76e616e11f..2c3f206da4a4c1a9138243d4108caabaaab187b5 100644
--- a/model-00124-of-00130.safetensors
+++ b/model-00124-of-00130.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3439acf43dfe9db0ea78c681acccd0ee9b80d7c63b5865755921a1f1244a1a9c
-size 1229199552
+oid sha256:3543ef495910c94d69b4707153646bb2d55588fef092d3450ac03e3179db11d9
+size 1229199568
diff --git a/modeling_list_ultra.py b/modeling_list_ultra.py
new file mode 100644
index 0000000000000000000000000000000000000000..8846d38acc932d1dcb0302bb719296313f5225a8
--- /dev/null
+++ b/modeling_list_ultra.py
@@ -0,0 +1,706 @@
+#                🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
+#           This file was automatically generated from src/transformers/models/minimax_m2/modular_minimax_m2.py.
+#               Do NOT edit this file manually as any edits will be overwritten by the generation of
+#             the file from the modular. If any change should be done, please apply the change to the
+#                          modular_minimax_m2.py file directly. One of our CI enforces this.
+#                🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
+# coding=utf-8
+# Copyright 2025 the HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from collections.abc import Callable
+from typing import Optional, Union, Unpack
+
+import torch
+from torch import nn
+
+from transformers.activations import ACT2FN
+from transformers.cache_utils import Cache, DynamicCache
+from transformers.generation import GenerationMixin
+from transformers.integrations import use_kernel_forward_from_hub
+from transformers.masking_utils import create_causal_mask, create_sliding_window_causal_mask
+from transformers.modeling_flash_attention_utils import FlashAttentionKwargs
+from transformers.modeling_layers import (
+    GenericForQuestionAnswering,
+    GenericForSequenceClassification,
+    GenericForTokenClassification,
+    GradientCheckpointingLayer,
+)
+from transformers.modeling_outputs import MoeCausalLMOutputWithPast, MoeModelOutputWithPast
+from transformers.modeling_rope_utils import ROPE_INIT_FUNCTIONS, dynamic_rope_update
+from transformers.modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
+from transformers.utils import TransformersKwargs, auto_docstring, can_return_tuple
+from transformers.utils.deprecation import deprecate_kwarg
+from transformers.utils.generic import OutputRecorder, check_model_inputs
+from .configuration_minimax_m2 import MiniMaxM2Config
+
+
+class MiniMaxM2MLP(nn.Module):
+    def __init__(self, config: MiniMaxM2Config):
+        super().__init__()
+        self.ffn_dim = config.intermediate_size
+        self.hidden_dim = config.hidden_size
+
+        self.w1 = nn.Linear(self.hidden_dim, self.ffn_dim, bias=False)
+        self.w2 = nn.Linear(self.ffn_dim, self.hidden_dim, bias=False)
+        self.w3 = nn.Linear(self.hidden_dim, self.ffn_dim, bias=False)
+
+        self.act_fn = ACT2FN[config.hidden_act]
+
+    def forward(self, hidden_states):
+        current_hidden_states = self.act_fn(self.w1(hidden_states)) * self.w3(hidden_states)
+        current_hidden_states = self.w2(current_hidden_states)
+        return current_hidden_states
+
+
+class MiniMaxM2Experts(nn.ModuleList):
+    """
+    ModuleList of experts.
+    """
+
+    def __init__(self, config: MiniMaxM2Config):
+        super().__init__()
+        self.top_k = config.num_experts_per_tok
+        self.num_experts = config.num_local_experts
+        for _ in range(self.num_experts):
+            self.append(MiniMaxM2MLP(config))
+
+    def forward(
+        self, hidden_states: torch.Tensor, top_k_index: torch.Tensor, top_k_weights: torch.Tensor
+    ) -> torch.Tensor:
+        """
+        Args:
+            hidden_states: (batch_size * sequence_length, hidden_dim)
+            selected_experts: (batch_size * sequence_length, top_k)
+            routing_weights: (batch_size * sequence_length, top_k)
+        Returns:
+            (batch_size * sequence_length, hidden_dim)
+        """
+        final_hidden_states = torch.zeros_like(hidden_states)
+        expert_mask = torch.nn.functional.one_hot(top_k_index, num_classes=self.num_experts).permute(2, 1, 0)
+
+        expert_hit = torch.greater(expert_mask.sum(dim=(-1, -2)), 0).nonzero()
+        for expert_idx in expert_hit:
+            idx, top_x = torch.where(expert_mask[expert_idx].squeeze(0))
+            current_state = hidden_states[None, top_x].reshape(-1, hidden_states.shape[-1])
+            current_hidden_states = self[expert_idx](current_state) * top_k_weights[top_x, idx, None]
+            final_hidden_states.index_add_(0, top_x, current_hidden_states.to(hidden_states.dtype))
+        return final_hidden_states
+
+
+class MiniMaxM2SparseMoeBlock(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.top_k = config.num_experts_per_tok
+        self.jitter_noise = config.router_jitter_noise
+        self.gate = nn.Linear(config.hidden_size, config.num_local_experts, bias=False)
+        self.experts = MiniMaxM2Experts(config)
+        self.register_buffer("e_score_correction_bias", torch.zeros(config.num_local_experts))
+
+    def route_tokens_to_experts(self, router_logits):
+        routing_weights = torch.nn.functional.sigmoid(router_logits.float())
+        scores_for_choice = routing_weights + self.e_score_correction_bias
+        _, top_k_index = torch.topk(scores_for_choice, self.top_k, dim=-1, sorted=False)
+        top_k_weights = routing_weights.gather(1, top_k_index)
+        top_k_weights /= top_k_weights.sum(dim=-1, keepdim=True)
+        return top_k_index, top_k_weights.to(router_logits.dtype)
+
+    def forward(self, hidden_states: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
+        batch_size, sequence_length, hidden_dim = hidden_states.shape
+        if self.training and self.jitter_noise > 0:
+            hidden_states *= torch.empty_like(hidden_states).uniform_(1.0 - self.jitter_noise, 1.0 + self.jitter_noise)
+        hidden_states = hidden_states.view(-1, hidden_states.shape[-1])
+        router_logits = self.gate(hidden_states)
+        top_k_index, top_k_weights = self.route_tokens_to_experts(router_logits)
+        hidden_states = self.experts(hidden_states, top_k_index, top_k_weights.to(hidden_states.dtype))
+        hidden_states = hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+        return hidden_states, router_logits
+
+
+@use_kernel_forward_from_hub("RMSNorm")
+class MiniMaxM2RMSNorm(nn.Module):
+    def __init__(self, hidden_size, eps=1e-6):
+        """
+        MiniMaxM2RMSNorm is equivalent to T5LayerNorm
+        """
+        super().__init__()
+        self.weight = nn.Parameter(torch.ones(hidden_size))
+        self.variance_epsilon = eps
+
+    def forward(self, hidden_states):
+        input_dtype = hidden_states.dtype
+        hidden_states = hidden_states.to(torch.float32)
+        variance = hidden_states.pow(2).mean(-1, keepdim=True)
+        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
+        return self.weight * hidden_states.to(input_dtype)
+
+    def extra_repr(self):
+        return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}"
+
+
+def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
+    """
+    This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch,
+    num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)
+    """
+    batch, num_key_value_heads, slen, head_dim = hidden_states.shape
+    if n_rep == 1:
+        return hidden_states
+    hidden_states = hidden_states[:, :, None, :, :].expand(batch, num_key_value_heads, n_rep, slen, head_dim)
+    return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
+
+
+def eager_attention_forward(
+    module: nn.Module,
+    query: torch.Tensor,
+    key: torch.Tensor,
+    value: torch.Tensor,
+    attention_mask: Optional[torch.Tensor],
+    scaling: float,
+    dropout: float = 0.0,
+    **kwargs: Unpack[TransformersKwargs],
+):
+    key_states = repeat_kv(key, module.num_key_value_groups)
+    value_states = repeat_kv(value, module.num_key_value_groups)
+
+    attn_weights = torch.matmul(query, key_states.transpose(2, 3)) * scaling
+    if attention_mask is not None:
+        causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+        attn_weights = attn_weights + causal_mask
+
+    attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query.dtype)
+    attn_weights = nn.functional.dropout(attn_weights, p=dropout, training=module.training)
+    attn_output = torch.matmul(attn_weights, value_states)
+    attn_output = attn_output.transpose(1, 2).contiguous()
+
+    return attn_output, attn_weights
+
+
+def rotate_half(x):
+    """Rotates half the hidden dims of the input."""
+    x1 = x[..., : x.shape[-1] // 2]
+    x2 = x[..., x.shape[-1] // 2 :]
+    return torch.cat((-x2, x1), dim=-1)
+
+
+def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
+    """Applies Rotary Position Embedding to the query and key tensors.
+
+    Args:
+        q (`torch.Tensor`): The query tensor.
+        k (`torch.Tensor`): The key tensor.
+        cos (`torch.Tensor`): The cosine part of the rotary embedding.
+        sin (`torch.Tensor`): The sine part of the rotary embedding.
+        position_ids (`torch.Tensor`, *optional*):
+            Deprecated and unused.
+        unsqueeze_dim (`int`, *optional*, defaults to 1):
+            The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
+            sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
+            that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
+            k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
+            cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
+            the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
+    Returns:
+        `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
+    """
+    cos = cos.unsqueeze(unsqueeze_dim)
+    sin = sin.unsqueeze(unsqueeze_dim)
+
+    # Keep half or full tensor for later concatenation
+    rotary_dim = cos.shape[-1]
+    q_rot, q_pass = q[..., :rotary_dim], q[..., rotary_dim:]
+    k_rot, k_pass = k[..., :rotary_dim], k[..., rotary_dim:]
+
+    # Apply rotary embeddings on the first half or full tensor
+    q_embed = (q_rot * cos) + (rotate_half(q_rot) * sin)
+    k_embed = (k_rot * cos) + (rotate_half(k_rot) * sin)
+
+    # Concatenate back to full shape
+    q_embed = torch.cat([q_embed, q_pass], dim=-1)
+    k_embed = torch.cat([k_embed, k_pass], dim=-1)
+    return q_embed, k_embed
+
+
+class MiniMaxM2Attention(nn.Module):
+    """Multi-headed attention from 'Attention Is All You Need' paper"""
+
+    def __init__(self, config: MiniMaxM2Config, layer_idx: int):
+        super().__init__()
+        self.config = config
+        self.layer_idx = layer_idx
+        self.head_dim = getattr(config, "head_dim", None) or config.hidden_size // config.num_attention_heads
+        self.num_key_value_groups = config.num_attention_heads // config.num_key_value_heads
+        self.scaling = self.head_dim**-0.5
+        self.attention_dropout = config.attention_dropout
+        self.is_causal = True
+        self.q_proj = nn.Linear(config.hidden_size, config.num_attention_heads * self.head_dim, bias=False)
+        self.k_proj = nn.Linear(config.hidden_size, config.num_key_value_heads * self.head_dim, bias=False)
+        self.v_proj = nn.Linear(config.hidden_size, config.num_key_value_heads * self.head_dim, bias=False)
+        self.o_proj = nn.Linear(config.num_attention_heads * self.head_dim, config.hidden_size, bias=False)
+
+        self.use_qk_norm = config.use_qk_norm
+        if self.use_qk_norm:
+            self.q_norm = MiniMaxM2RMSNorm(self.head_dim * config.num_attention_heads, eps=config.rms_norm_eps)
+            self.k_norm = MiniMaxM2RMSNorm(self.head_dim * config.num_key_value_heads, eps=config.rms_norm_eps)
+
+    @deprecate_kwarg("past_key_value", new_name="past_key_values", version="4.58")
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        position_embeddings: tuple[torch.Tensor, torch.Tensor],
+        attention_mask: Optional[torch.Tensor],
+        past_key_values: Optional[Cache] = None,
+        cache_position: Optional[torch.LongTensor] = None,
+        **kwargs: Unpack[FlashAttentionKwargs],
+    ) -> tuple[torch.Tensor, Optional[torch.Tensor]]:
+        input_shape = hidden_states.shape[:-1]
+        hidden_shape = (*input_shape, -1, self.head_dim)
+
+        query_states = self.q_proj(hidden_states)
+        key_states = self.k_proj(hidden_states)
+        value_states = self.v_proj(hidden_states)
+
+        if self.use_qk_norm:  # main diff from Llama
+            query_states = self.q_norm(query_states)
+            key_states = self.k_norm(key_states)
+
+        key_states = key_states.view(hidden_shape)
+        query_states = query_states.view(hidden_shape)
+        value_states = value_states.view(hidden_shape)
+
+        query_states = query_states.transpose(1, 2)
+        key_states = key_states.transpose(1, 2)
+        value_states = value_states.transpose(1, 2)
+
+        cos, sin = position_embeddings
+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
+
+        if past_key_values is not None:
+            # sin and cos are specific to RoPE models; position_ids needed for the static cache
+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+            key_states, value_states = past_key_values.update(key_states, value_states, self.layer_idx, cache_kwargs)
+
+        attention_interface: Callable = eager_attention_forward
+        if self.config._attn_implementation != "eager":
+            attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
+
+        attn_output, attn_weights = attention_interface(
+            self,
+            query_states,
+            key_states,
+            value_states,
+            attention_mask,
+            dropout=0.0 if not self.training else self.attention_dropout,
+            scaling=self.scaling,
+            **kwargs,
+        )
+
+        attn_output = attn_output.reshape(*input_shape, -1).contiguous()
+        attn_output = self.o_proj(attn_output)
+        return attn_output, attn_weights
+
+
+class MiniMaxM2DecoderLayer(GradientCheckpointingLayer):
+    def __init__(self, config: MiniMaxM2Config, layer_idx: int):
+        super().__init__()
+        self.hidden_size = config.hidden_size
+
+        self.self_attn = MiniMaxM2Attention(config, layer_idx)
+
+        self.block_sparse_moe = MiniMaxM2SparseMoeBlock(config)
+        self.input_layernorm = MiniMaxM2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.post_attention_layernorm = MiniMaxM2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+
+    @deprecate_kwarg("past_key_value", new_name="past_key_values", version="4.58")
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        position_embeddings: tuple[torch.Tensor, torch.Tensor],
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[Cache] = None,
+        cache_position: Optional[torch.LongTensor] = None,
+        **kwargs: Unpack[TransformersKwargs],
+    ) -> torch.FloatTensor:
+        residual = hidden_states
+
+        hidden_states = self.input_layernorm(hidden_states)
+
+        # Self Attention
+        hidden_states, _ = self.self_attn(
+            hidden_states=hidden_states,
+            position_embeddings=position_embeddings,
+            attention_mask=attention_mask,
+            position_ids=position_ids,
+            past_key_values=past_key_values,
+            cache_position=cache_position,
+            **kwargs,
+        )
+        hidden_states = residual + hidden_states
+
+        # Fully Connected
+        residual = hidden_states
+        hidden_states = self.post_attention_layernorm(hidden_states)
+        hidden_states, _ = self.block_sparse_moe(hidden_states)
+        hidden_states = residual + hidden_states
+
+        return hidden_states
+
+
+class MiniMaxM2RotaryEmbedding(nn.Module):
+    inv_freq: torch.Tensor  # fix linting for `register_buffer`
+
+    def __init__(self, config: MiniMaxM2Config, device=None):
+        super().__init__()
+        # BC: "rope_type" was originally "type"
+        if hasattr(config, "rope_scaling") and isinstance(config.rope_scaling, dict):
+            self.rope_type = config.rope_scaling.get("rope_type", config.rope_scaling.get("type"))
+        else:
+            self.rope_type = "default"
+        self.max_seq_len_cached = config.max_position_embeddings
+        self.original_max_seq_len = config.max_position_embeddings
+
+        self.config = config
+        self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]
+
+        inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device)
+        self.register_buffer("inv_freq", inv_freq, persistent=False)
+        self.original_inv_freq = self.inv_freq
+
+    @torch.no_grad()
+    @dynamic_rope_update  # power user: used with advanced RoPE types (e.g. dynamic rope)
+    def forward(self, x, position_ids):
+        inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1).to(x.device)
+        position_ids_expanded = position_ids[:, None, :].float()
+
+        device_type = x.device.type if isinstance(x.device.type, str) and x.device.type != "mps" else "cpu"
+        with torch.autocast(device_type=device_type, enabled=False):  # Force float32
+            freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
+            emb = torch.cat((freqs, freqs), dim=-1)
+            cos = emb.cos() * self.attention_scaling
+            sin = emb.sin() * self.attention_scaling
+
+        return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype)
+
+
+@auto_docstring
+class MiniMaxM2PreTrainedModel(PreTrainedModel):
+    config: MiniMaxM2Config
+    base_model_prefix = "model"
+    supports_gradient_checkpointing = True
+    _no_split_modules = ["MiniMaxM2DecoderLayer"]
+    _skip_keys_device_placement = ["past_key_values"]
+    _supports_flash_attn = True
+    _supports_sdpa = True
+    _supports_flex_attn = True
+    _can_compile_fullgraph = False  # MoE models don't work with torch.compile (`torch.where(condition)` not supported)
+    _supports_attention_backend = True
+    _can_record_outputs = {
+        "router_logits": OutputRecorder(MiniMaxM2SparseMoeBlock, index=1),
+        "hidden_states": MiniMaxM2DecoderLayer,
+        "attentions": MiniMaxM2Attention,
+    }
+
+
+@auto_docstring
+class MiniMaxM2Model(MiniMaxM2PreTrainedModel):
+    def __init__(self, config: MiniMaxM2Config):
+        super().__init__(config)
+        self.padding_idx = config.pad_token_id
+        self.vocab_size = config.vocab_size
+
+        self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
+        self.layers = nn.ModuleList(
+            [MiniMaxM2DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
+        )
+        self.norm = MiniMaxM2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.rotary_emb = MiniMaxM2RotaryEmbedding(config=config)
+        self.gradient_checkpointing = False
+
+        # Initialize weights and apply final processing
+        self.post_init()
+
+    @check_model_inputs
+    @auto_docstring
+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[Cache] = None,
+        inputs_embeds: Optional[torch.FloatTensor] = None,
+        use_cache: Optional[bool] = None,
+        cache_position: Optional[torch.LongTensor] = None,
+        **kwargs: Unpack[TransformersKwargs],
+    ) -> MoeModelOutputWithPast:
+        if (input_ids is None) ^ (inputs_embeds is not None):
+            raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
+
+        if use_cache and past_key_values is None:
+            past_key_values = DynamicCache(config=self.config)
+
+        if inputs_embeds is None:
+            inputs_embeds = self.embed_tokens(input_ids)
+
+        if cache_position is None:
+            past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
+            cache_position = torch.arange(
+                past_seen_tokens, past_seen_tokens + inputs_embeds.shape[1], device=inputs_embeds.device
+            )
+        if position_ids is None:
+            position_ids = cache_position.unsqueeze(0)
+
+        mask_function = create_causal_mask if self.config.sliding_window is None else create_sliding_window_causal_mask
+        causal_mask = mask_function(
+            config=self.config,
+            input_embeds=inputs_embeds,
+            attention_mask=attention_mask,
+            cache_position=cache_position,
+            past_key_values=past_key_values,
+            position_ids=position_ids,
+        )
+
+        hidden_states = inputs_embeds
+
+        # create position embeddings to be shared across the decoder layers
+        position_embeddings = self.rotary_emb(hidden_states, position_ids)
+
+        for decoder_layer in self.layers[: self.config.num_hidden_layers]:
+            hidden_states = decoder_layer(
+                hidden_states,
+                position_embeddings=position_embeddings,
+                attention_mask=causal_mask,
+                position_ids=position_ids,
+                past_key_values=past_key_values,
+                use_cache=use_cache,
+                cache_position=cache_position,
+                **kwargs,
+            )
+
+        hidden_states = self.norm(hidden_states)
+
+        return MoeModelOutputWithPast(  # only diff with Mistral is the output type, we need MoE
+            last_hidden_state=hidden_states,
+            past_key_values=past_key_values,
+        )
+
+
+def load_balancing_loss_func(
+    gate_logits: Union[torch.Tensor, tuple[torch.Tensor], None],
+    num_experts: Optional[int] = None,
+    top_k=2,
+    attention_mask: Optional[torch.Tensor] = None,
+) -> Union[torch.Tensor, int]:
+    r"""
+    Computes auxiliary load balancing loss as in Switch Transformer - implemented in Pytorch.
+
+    See Switch Transformer (https://huggingface.co/papers/2101.03961) for more details. This function implements the loss
+    function presented in equations (4) - (6) of the paper. It aims at penalizing cases where the routing between
+    experts is too unbalanced.
+
+    Args:
+        gate_logits:
+            Logits from the `gate`, should be a tuple of model.config.num_hidden_layers tensors of
+            shape [batch_size X sequence_length, num_experts].
+        num_experts:
+            Number of experts
+        top_k:
+            The number of experts to route per-token, can be also interpreted as the `top-k` routing
+            parameter.
+        attention_mask (`torch.Tensor`, *optional*):
+            The attention_mask used in forward function
+            shape [batch_size X sequence_length] if not None.
+
+    Returns:
+        The auxiliary loss.
+    """
+    if gate_logits is None or not isinstance(gate_logits, tuple):
+        return 0
+
+    if isinstance(gate_logits, tuple):
+        compute_device = gate_logits[0].device
+        concatenated_gate_logits = torch.cat([layer_gate.to(compute_device) for layer_gate in gate_logits], dim=0)
+
+    routing_weights = torch.nn.functional.softmax(concatenated_gate_logits, dim=-1)
+
+    _, selected_experts = torch.topk(routing_weights, top_k, dim=-1)
+
+    expert_mask = torch.nn.functional.one_hot(selected_experts, num_experts)
+
+    if attention_mask is None:
+        # Compute the percentage of tokens routed to each experts
+        tokens_per_expert = torch.mean(expert_mask.float(), dim=0)
+
+        # Compute the average probability of routing to these experts
+        router_prob_per_expert = torch.mean(routing_weights, dim=0)
+    else:
+        batch_size, sequence_length = attention_mask.shape
+        num_hidden_layers = concatenated_gate_logits.shape[0] // (batch_size * sequence_length)
+
+        # Compute the mask that masks all padding tokens as 0 with the same shape of expert_mask
+        expert_attention_mask = (
+            attention_mask[None, :, :, None, None]
+            .expand((num_hidden_layers, batch_size, sequence_length, top_k, num_experts))
+            .reshape(-1, top_k, num_experts)
+            .to(compute_device)
+        )
+
+        # Compute the percentage of tokens routed to each experts
+        tokens_per_expert = torch.sum(expert_mask.float() * expert_attention_mask, dim=0) / torch.sum(
+            expert_attention_mask, dim=0
+        )
+
+        # Compute the mask that masks all padding tokens as 0 with the same shape of tokens_per_expert
+        router_per_expert_attention_mask = (
+            attention_mask[None, :, :, None]
+            .expand((num_hidden_layers, batch_size, sequence_length, num_experts))
+            .reshape(-1, num_experts)
+            .to(compute_device)
+        )
+
+        # Compute the average probability of routing to these experts
+        router_prob_per_expert = torch.sum(routing_weights * router_per_expert_attention_mask, dim=0) / torch.sum(
+            router_per_expert_attention_mask, dim=0
+        )
+
+    overall_loss = torch.sum(tokens_per_expert * router_prob_per_expert.unsqueeze(0))
+    return overall_loss * num_experts
+
+
+@auto_docstring
+class MiniMaxM2ForCausalLM(MiniMaxM2PreTrainedModel, GenerationMixin):
+    _tied_weights_keys = ["lm_head.weight"]
+    _tp_plan = {"lm_head": "colwise_rep"}
+    _pp_plan = {"lm_head": (["hidden_states"], ["logits"])}
+
+    def __init__(self, config):
+        super().__init__(config)
+        self.model = MiniMaxM2Model(config)
+        self.vocab_size = config.vocab_size
+        self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
+        self.router_aux_loss_coef = config.router_aux_loss_coef
+        self.num_experts = config.num_local_experts
+        self.num_experts_per_tok = config.num_experts_per_tok
+
+        # Initialize weights and apply final processing
+        self.post_init()
+
+    @can_return_tuple
+    @auto_docstring
+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[Cache] = None,
+        inputs_embeds: Optional[torch.FloatTensor] = None,
+        labels: Optional[torch.LongTensor] = None,
+        use_cache: Optional[bool] = None,
+        output_router_logits: Optional[bool] = None,
+        cache_position: Optional[torch.LongTensor] = None,
+        logits_to_keep: Union[int, torch.Tensor] = 0,
+        **kwargs: Unpack[TransformersKwargs],
+    ) -> MoeCausalLMOutputWithPast:
+        r"""
+        labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
+            Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
+            config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
+            (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.
+
+        Example:
+
+        ```python
+        >>> from transformers import AutoTokenizer, MiniMaxM2ForCausalLM
+
+        >>> model = MiniMaxM2ForCausalLM.from_pretrained("mistralai/MiniMaxM2-8x7B-v0.1")
+        >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/MiniMaxM2-8x7B-v0.1")
+
+        >>> prompt = "Hey, are you conscious? Can you talk to me?"
+        >>> inputs = tokenizer(prompt, return_tensors="pt")
+
+        >>> # Generate
+        >>> generate_ids = model.generate(inputs.input_ids, max_length=30)
+        >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+        "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+        ```"""
+
+        output_router_logits = (
+            output_router_logits if output_router_logits is not None else self.config.output_router_logits
+        )
+
+        # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
+        outputs: MoeModelOutputWithPast = self.model(
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            position_ids=position_ids,
+            past_key_values=past_key_values,
+            inputs_embeds=inputs_embeds,
+            use_cache=use_cache,
+            output_router_logits=output_router_logits,
+            cache_position=cache_position,
+            **kwargs,
+        )
+
+        hidden_states = outputs.last_hidden_state
+        # Only compute necessary logits, and do not upcast them to float if we are not computing the loss
+        slice_indices = slice(-logits_to_keep, None) if isinstance(logits_to_keep, int) else logits_to_keep
+        logits = self.lm_head(hidden_states[:, slice_indices, :])
+
+        loss = None
+        if labels is not None:
+            loss = self.loss_function(logits, labels, self.vocab_size, **kwargs)
+
+        aux_loss = None
+        if output_router_logits:
+            aux_loss = load_balancing_loss_func(
+                outputs.router_logits,
+                self.num_experts,
+                self.num_experts_per_tok,
+                attention_mask,
+            )
+            if labels is not None:
+                loss += self.router_aux_loss_coef * aux_loss.to(loss.device)  # make sure to reside in the same device
+
+        return MoeCausalLMOutputWithPast(
+            loss=loss,
+            aux_loss=aux_loss,
+            logits=logits,
+            past_key_values=outputs.past_key_values,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+            router_logits=outputs.router_logits,
+        )
+
+
+class MiniMaxM2ForSequenceClassification(GenericForSequenceClassification, MiniMaxM2PreTrainedModel):
+    pass
+
+
+class MiniMaxM2ForTokenClassification(GenericForTokenClassification, MiniMaxM2PreTrainedModel):
+    pass
+
+
+class MiniMaxM2ForQuestionAnswering(GenericForQuestionAnswering, MiniMaxM2PreTrainedModel):
+    pass
+
+
+__all__ = [
+    "MiniMaxM2ForCausalLM",
+    "MiniMaxM2ForQuestionAnswering",
+    "MiniMaxM2Model",
+    "MiniMaxM2PreTrainedModel",
+    "MiniMaxM2ForSequenceClassification",
+    "MiniMaxM2ForTokenClassification",
+]
diff --git a/subir_huggingface.py b/subir_huggingface.py
new file mode 100644
index 0000000000000000000000000000000000000000..d2acdf9f55e8ad8dee4908da7579e19241c312da
--- /dev/null
+++ b/subir_huggingface.py
@@ -0,0 +1,19 @@
+from huggingface_hub import HfApi
+
+api = HfApi()
+
+# O nome do seu repositório no HF
+repo_id = "List-cloud/List-3.0-Ultra-Coder-Brain"
+
+print("Iniciando upload para o Hugging Face... Isso pode demorar bastante dependendo da internet.")
+
+# Faz o upload da pasta inteira, substituindo os arquivos antigos no HF
+api.upload_folder(
+    folder_path=r"K:\List-3.0-Ultra-Coder\List-3.0-Ultra-Coder-Brain",
+    repo_id=repo_id,
+    repo_type="model",
+    # Ignora os scripts de automação que você não quer subir
+    ignore_patterns=["*.pyc", "update_model_hashes.py", "boost_downloads.py", "upload_model.py"]
+)
+
+print("Upload concluído com sucesso!")
diff --git a/tokenizer_config.json b/tokenizer_config.json
index ff8e2ebcbdb03324603c0a734e459ec9968096ae..4801e04325e0078db80978918a3f3a0ad8fc09f6 100644
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -1,495 +1,496 @@
-{
-  "added_tokens_decoder": {
-  "200000": {
-      "content": "]!p~[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200001": {
-      "content": "<fim_prefix>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200002": {
-      "content": "<fim_middle>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200003": {
-      "content": "<fim_suffix>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200004": {
-      "content": "<fim_pad>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200005": {
-      "content": "<reponame>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200006": {
-      "content": "<filename>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200007": {
-      "content": "<gh_stars>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200008": {
-      "content": "<issue_start>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200009": {
-      "content": "<issue_comment>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200010": {
-      "content": "<issue_closed>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200011": {
-    "content": "<jupyter_start>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false,
-    "special": true
-  },
-  "200012": {
-      "content": "<jupyter_text>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200013": {
-      "content": "<jupyter_code>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200014": {
-      "content": "<jupyter_output>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200015": {
-    "content": "<empty_output>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false,
-    "special": true
-  },
-  "200016": {
-      "content": "<commit_before>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200017": {
-      "content": "<commit_msg>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200018": {
-      "content": "<commit_after>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200019": {
-      "content": "]~b]",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200020": {
-      "content": "[e~[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200021": {
-      "content": "]!d~[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200022": {
-      "content": "<function_call>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200023": {
-      "content": "<code_interpreter>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200024": {
-      "content": "]<]speech[>[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200025": {
-      "content": "]<]image[>[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200026": {
-      "content": "]<]video[>[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200027": {
-      "content": "]<]start of speech[>[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200028": {
-      "content": "]<]end of speech[>[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200029": {
-      "content": "]<]start of image[>[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200030": {
-      "content": "]<]end of image[>[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200031": {
-      "content": "]<]start of video[>[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200032": {
-      "content": "]<]end of video[>[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200033": {
-      "content": "]<]vision pad[>[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200034": {
-      "content": "]~!b[",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200035": {
-      "content": "<jupyter_error>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200036": {
-      "content": "<add_file>",
-      "single_word": false,
-      "lstrip": false,
-      "rstrip": false,
-      "normalized": false,
-      "special": true
-  },
-  "200037": {
-      "content": "<delete_file>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200038": {
-      "content": "<rename_file>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200039": {
-      "content": "<edit_file>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200040": {
-      "content": "<commit_message>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200041": {
-      "content": "<empty_source_file>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200042": {
-      "content": "<repo_struct>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-  },
-  "200043": {
-    "content": "<code_context>",
-    "single_word": false,
-    "lstrip": false,
-    "rstrip": false,
-    "normalized": false,
-    "special": true
-  },
-  "200044": {
-    "content": "<file_content>",
-    "single_word": false,
-    "lstrip": false,
-    "rstrip": false,
-    "normalized": false,
-    "special": true
-  },
-  "200045": {
-    "content": "<source_files>",
-    "single_word": false,
-    "lstrip": false,
-    "rstrip": false,
-    "normalized": false,
-    "special": true
-  },
-  "200046": {
-    "content": "<pr_start>",
-    "single_word": false,
-    "lstrip": false,
-    "rstrip": false,
-    "normalized": false,
-    "special": true
-  },
-  "200047": {
-    "content": "<review_comment>",
-    "single_word": false,
-    "lstrip": false,
-    "rstrip": false,
-    "normalized": false,
-    "special": true
-  },
-  "200048": {
-    "content": "<filepath>",
-    "single_word": false,
-    "lstrip": false,
-    "rstrip": false,
-    "normalized": false,
-    "special": true
-  },
-  "200049": {
-    "content": "<file_sep>",
-    "single_word": false,
-    "lstrip": false,
-    "rstrip": false,
-    "normalized": false,
-    "special": true
-  },
-  "200050": {
-    "content": "<think>",
-    "single_word": false,
-    "lstrip": false,
-    "rstrip": false,
-    "normalized": false,
-    "special": false
-  },
-  "200051": {
-    "content": "</think>",
-    "single_word": false,
-    "lstrip": false,
-    "rstrip": false,
-    "normalized": false,
-    "special": false
-  },
-  "200052": {
-    "content": "<minimax:tool_call>",
-    "single_word": false,
-    "lstrip": false,
-    "rstrip": false,
-    "normalized": false,
-    "special": false
-  },  
-  "200053": {
-    "content": "</minimax:tool_call>",
-    "single_word": false,
-    "lstrip": false,
-    "rstrip": false,
-    "normalized": false,
-    "special": false
-  }
-  },
-  "additional_special_tokens": [
-        "<code_interpreter>",
-        "<commit_after>",
-        "<commit_before>",
-        "<commit_msg>",
-        "<empty_output>",
-        "<filename>",
-        "<fim_middle>",
-        "<fim_pad>",
-        "<fim_prefix>",
-        "<fim_suffix>",
-        "<function_call>",
-        "<gh_stars>",
-        "]<]speech[>[",
-        "]<]image[>[",
-        "]<]video[>[",
-        "]<]start of speech[>[",
-        "]<]end of speech[>[",
-        "]<]start of image[>[",
-        "]<]end of image[>[",
-        "]<]start of video[>[",
-        "]<]end of video[>[",
-        "]<]vision pad[>[",
-        "]~!b[",
-        "<issue_closed>",
-        "<issue_comment>",
-        "<issue_start>",
-        "<jupyter_code>",
-        "<jupyter_output>",
-        "<jupyter_start>",
-        "<jupyter_text>",
-        "<reponame>",
-        "[e~[",
-        "]!d~[",
-        "]!p~[",
-        "]~b]",
-        "<jupyter_error>",
-        "<add_file>",
-        "<delete_file>",
-        "<rename_file>",
-        "<edit_file>",
-        "<commit_message>",
-        "<empty_source_file>",
-        "<repo_struct>",
-        "<code_context>",
-        "<file_content>",
-        "<source_files>",
-        "<pr_start>",
-        "<review_comment>",
-        "<filepath>",
-        "<file_sep>"
-    ],
-  "add_prefix_space": false,
-  "bos_token": "]~!b[",
-  "clean_up_tokenization_spaces": false,
-  "eos_token": "[e~[",
-  "model_max_length": 40960000,
-  "tokenizer_class": "GPT2Tokenizer",
-  "unk_token": "]!d~["
-}
+{
+  "added_tokens_decoder": {
+    "200000": {
+      "content": "]!p~[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200001": {
+      "content": "<fim_prefix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200002": {
+      "content": "<fim_middle>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200003": {
+      "content": "<fim_suffix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200004": {
+      "content": "<fim_pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200005": {
+      "content": "<reponame>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200006": {
+      "content": "<filename>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200007": {
+      "content": "<gh_stars>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200008": {
+      "content": "<issue_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200009": {
+      "content": "<issue_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200010": {
+      "content": "<issue_closed>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200011": {
+      "content": "<jupyter_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200012": {
+      "content": "<jupyter_text>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200013": {
+      "content": "<jupyter_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200014": {
+      "content": "<jupyter_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200015": {
+      "content": "<empty_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200016": {
+      "content": "<commit_before>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200017": {
+      "content": "<commit_msg>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200018": {
+      "content": "<commit_after>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200019": {
+      "content": "]~b]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200020": {
+      "content": "[e~[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200021": {
+      "content": "]!d~[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200022": {
+      "content": "<function_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200023": {
+      "content": "<code_interpreter>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200024": {
+      "content": "]<]speech[>[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200025": {
+      "content": "]<]image[>[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200026": {
+      "content": "]<]video[>[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200027": {
+      "content": "]<]start of speech[>[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200028": {
+      "content": "]<]end of speech[>[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200029": {
+      "content": "]<]start of image[>[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200030": {
+      "content": "]<]end of image[>[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200031": {
+      "content": "]<]start of video[>[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200032": {
+      "content": "]<]end of video[>[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200033": {
+      "content": "]<]vision pad[>[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200034": {
+      "content": "]~!b[",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200035": {
+      "content": "<jupyter_error>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200036": {
+      "content": "<add_file>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "200037": {
+      "content": "<delete_file>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200038": {
+      "content": "<rename_file>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200039": {
+      "content": "<edit_file>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200040": {
+      "content": "<commit_message>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200041": {
+      "content": "<empty_source_file>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200042": {
+      "content": "<repo_struct>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "200043": {
+      "content": "<code_context>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "200044": {
+      "content": "<file_content>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "200045": {
+      "content": "<source_files>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "200046": {
+      "content": "<pr_start>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "200047": {
+      "content": "<review_comment>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "200048": {
+      "content": "<filepath>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "200049": {
+      "content": "<file_sep>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "200050": {
+      "content": "<think>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": false
+    },
+    "200051": {
+      "content": "</think>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": false
+    },
+    "200052": {
+      "content": "<minimax:tool_call>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": false
+    },
+    "200053": {
+      "content": "</minimax:tool_call>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<code_interpreter>",
+    "<commit_after>",
+    "<commit_before>",
+    "<commit_msg>",
+    "<empty_output>",
+    "<filename>",
+    "<fim_middle>",
+    "<fim_pad>",
+    "<fim_prefix>",
+    "<fim_suffix>",
+    "<function_call>",
+    "<gh_stars>",
+    "]<]speech[>[",
+    "]<]image[>[",
+    "]<]video[>[",
+    "]<]start of speech[>[",
+    "]<]end of speech[>[",
+    "]<]start of image[>[",
+    "]<]end of image[>[",
+    "]<]start of video[>[",
+    "]<]end of video[>[",
+    "]<]vision pad[>[",
+    "]~!b[",
+    "<issue_closed>",
+    "<issue_comment>",
+    "<issue_start>",
+    "<jupyter_code>",
+    "<jupyter_output>",
+    "<jupyter_start>",
+    "<jupyter_text>",
+    "<reponame>",
+    "[e~[",
+    "]!d~[",
+    "]!p~[",
+    "]~b]",
+    "<jupyter_error>",
+    "<add_file>",
+    "<delete_file>",
+    "<rename_file>",
+    "<edit_file>",
+    "<commit_message>",
+    "<empty_source_file>",
+    "<repo_struct>",
+    "<code_context>",
+    "<file_content>",
+    "<source_files>",
+    "<pr_start>",
+    "<review_comment>",
+    "<filepath>",
+    "<file_sep>"
+  ],
+  "add_prefix_space": false,
+  "bos_token": "]~!b[",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "[e~[",
+  "model_max_length": 40960000,
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "]!d~[",
+  "model_creator": "List Cloud"
+}
\ No newline at end of file