diff --git a/README.md b/README.md
index 32dd0939a0bec2ece7b96a5e8ca6e87a22a8a3fa..ca2344550047735fff80bc4c9a629ddba1591053 100644
--- a/README.md
+++ b/README.md
@@ -1,190 +1,191 @@
----
-language:
-- en
-license: apache-2.0
-tags:
-- code
-- list-coder
-- 228B
-- ultra-reasoning
-- list-ultra
-- enterprise
-- mixture-of-experts
-- moe
-- mtp
-- fp8
-model_name: List-3.0-Ultra-Coder
-pipeline_tag: text-generation
-library_name: transformers
----
-
-
-
-

-
-# π List-3.0-Ultra-Coder
-
-### The Next Frontier of AI-Powered Software Engineering
-
-[](https://list-coder.com/)
-[](https://list-coder.com/download)
-[](https://www.instagram.com/trylistcoder/)
-
----
-
-**228 Billion Parameters** Β· **256 Mixture-of-Experts** Β· **204K Context Window** Β· **Multi-Token Prediction**
-
-*The largest and most capable coding model ever built for the List-Coder ecosystem.*
-
-
-
----
-
-## π Why List-3.0-Ultra-Coder?
-
-**List-3.0-Ultra-Coder** is not just an incremental update β it's a generational leap. Built on a proprietary **Mixture-of-Experts (MoE)** architecture with **256 specialized expert networks**, this model processes code the way a team of 256 senior engineers would: each expert activates only when its unique domain expertise is needed, delivering **titan-level accuracy at a fraction of the computational cost**.
-
-> **"We didn't build another coding assistant. We built the engineer that engineers wish they had."**
-
----
-
-## π Performance Benchmarks
-
-We benchmark against the best models on the planet. No cherry-picking. No asterisks.
-
-| Model | HumanEval+ | MBPP+ | Multi-File Refactor | Architecture Design | Latency | Verdict |
-| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
-| **π₯ List-3.0-Ultra-Coder** | **98.2%** | **97.8%** | **96.5%** | **97.1%** | **38ms** | **π King** |
-| Claude Opus 4.7 | 97.8% | 97.2% | 95.8% | 96.4% | 1200ms | Titan |
-| Gemini 3.1 Ultra | 97.5% | 97.0% | 94.2% | 95.8% | 850ms | Titan |
-| GPT-5.4 Pro | 95.1% | 94.8% | 91.3% | 93.2% | 900ms | ~~Beaten~~ |
-| DeepSeek-V3 | 94.8% | 94.5% | 90.7% | 92.1% | 400ms | ~~Beaten~~ |
-| Llama 4-405B | 94.2% | 94.0% | 89.5% | 91.8% | 600ms | ~~Beaten~~ |
-| Qwen3-235B-A22B | 93.8% | 93.5% | 88.9% | 90.5% | 350ms | ~~Beaten~~ |
-| Mistral Large 3 | 93.2% | 93.0% | 87.3% | 89.7% | 300ms | ~~Beaten~~ |
-
-> **38ms average latency.** That's not a typo. Our MoE routing activates only 8 of 256 experts per token, giving you the intelligence of a 228B model with the speed of a 7B model.
-
----
-
-## β‘ What's New in 3.0
-
-| Feature | List-2.0 | **List-3.0** |
-| :--- | :---: | :---: |
-| Parameters | 500B (Dense) | **228B (MoE)** |
-| Active Parameters | 500B | **~7B per token** |
-| Expert Networks | β | **256 Specialists** |
-| Context Window | 128K | **204,800 tokens** |
-| Multi-Token Prediction | β | **β
3-token lookahead** |
-| FP8 Quantization | β | **β
Dynamic** |
-| Speed vs 2.0 | 1x | **~31x faster** |
-| Architecture Reasoning | Good | **State-of-the-art** |
-| Security Auditing | Basic | **Enterprise-grade** |
-
----
-
-## π Technical Specifications
-
-```yaml
-Architecture: Mixture-of-Experts (MoE) with Multi-Token Prediction (MTP)
-Total Parameters: 228,000,000,000 (228B)
-Active per Token: ~7B (8 of 256 experts)
-Expert Networks: 256 specialized routing experts
-MTP Modules: 3 (predicts 3 tokens ahead simultaneously)
-Hidden Size: 3,072
-Attention Heads: 48 (8 KV heads, GQA)
-Layers: 62 transformer blocks
-Context Window: 204,800 tokens (~400 pages of code)
-Quantization: FP8 (float8_e4m3fn) with dynamic activation
-Precision: BFloat16 (training) / FP8 (inference)
-Vocabulary: 200,064 tokens
-RoPE ΞΈ: 5,000,000 (extreme long-context support)
-```
-
----
-
-## π Get Started in 60 Seconds
-
-### Option 1: List Coder IDE (Recommended)
-
-The fastest way to experience **List-3.0-Ultra-Coder** at full power.
-
-1. **Download** the List Coder IDE from **[list-coder.com](https://list-coder.com/download)**
-2. **Sign in** with your account
-3. **Start coding** β the model is pre-configured and ready
-
-> π‘ The IDE provides native integration with all List models, including real-time code completion, multi-file refactoring, and architectural guidance.
-
-
-### Option 3: Local Deployment (Advanced)
-
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-
-model_name = "List-cloud/List-3.0-Ultra-Coder-Brain"
-tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
-model = AutoModelForCausalLM.from_pretrained(
- model_name,
- device_map="auto",
- trust_remote_code=True,
- torch_dtype="auto"
-)
-
-prompt = "Implement a lock-free concurrent hash map in Rust with work-stealing."
-inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=4096)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```
-
-> β οΈ Local deployment requires **8x A100 80GB** or equivalent. For most users, the **API** or **IDE** is recommended.
-
----
-
-## π― What List-3.0 Excels At
-
-| Domain | Capability |
-| :--- | :--- |
-| ποΈ **Architecture Design** | Design entire system architectures from a single prompt. Microservices, event-driven, CQRS β it knows them all. |
-| π **Multi-File Refactoring** | Understands 200K+ tokens of context. Refactor across hundreds of files with full dependency awareness. |
-| π **Security Auditing** | Identifies OWASP Top 10, supply chain vulnerabilities, and zero-day patterns in real-time. |
-| π§ͺ **Test Generation** | Generates comprehensive test suites with edge cases, mocks, and integration tests. |
-| π **Documentation** | Produces production-ready docs, API references, and architecture decision records (ADRs). |
-| π **Debugging** | Traces bugs across stack traces, async boundaries, and distributed systems. |
-
-
-
-## π The List-Coder Ecosystem
-
-| Product | Description |
-| :--- | :--- |
-| [**List Coder IDE**](https://list-coder.com/download) | Full-featured code editor with native AI integration |
-| [**List-1.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-1.0-Ultra-Coder) | Fast, lightweight model for everyday coding |
-| [**List-2.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-2.0-Ultra-Coder) | High-performance dense model for complex tasks |
-| [**List-3.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-3.0-Ultra-Coder-Brain) | Our flagship β 228B MoE powerhouse |
-| [**List-Stack-10M**](https://huggingface.co/List-cloud/List-Stack-10M) | Specialized for full-stack web development |
-
----
-
-## π License
-
-This model is released under the **Apache 2.0 License**. You are free to use, modify, and distribute it for both commercial and non-commercial purposes.
-
----
-
-## π Connect
-
-- π **Website:** [list-coder.com](https://list-coder.com/)
-- π’ **Organization:** [List-cloud on HuggingFace](https://huggingface.co/List-cloud)
-- π§ **Enterprise Sales:** enterprise@list-coder.com
-
----
-
-
-
-### β Star this repo if List-3.0 helps you code faster
-
-**Built with obsession by [List Enterprise](https://list-coder.com/) β Making every developer 10x.**
-
-*Β© 2026 List Enterprise. All rights reserved.*
-
-
+ο»Ώ---
+language:
+- en
+license: apache-2.0
+tags:
+- code
+- list-coder
+- 228B
+- ultra-reasoning
+- list-ultra
+- enterprise
+- mixture-of-experts
+- moe
+- mtp
+- fp8
+model_name: List-3.0-Ultra-Coder
+pipeline_tag: text-generation
+library_name: transformers
+---
+
+
+
+

+
+# Γ°ΕΈΕΕ List-3.0-Ultra-Coder
+
+### The Next Frontier of AI-Powered Software Engineering
+
+[](https://list-coder.com/)
+[](https://list-coder.com/download)
+[](https://www.instagram.com/trylistcoder/)
+
+---
+
+**228 Billion Parameters** ΓΒ· **256 Mixture-of-Experts** ΓΒ· **204K Context Window** ΓΒ· **Multi-Token Prediction**
+
+*The largest and most capable coding model ever built for the List-Coder ecosystem.*
+
+
+
+---
+
+## Γ°ΕΈΒβ Why List-3.0-Ultra-Coder?
+
+**List-3.0-Ultra-Coder** is not just an incremental update Γ’β¬β it's a generational leap. Built on a proprietary **Mixture-of-Experts (MoE)** architecture with **256 specialized expert networks**, this model processes code the way a team of 256 senior engineers would: each expert activates only when its unique domain expertise is needed, delivering **titan-level accuracy at a fraction of the computational cost**.
+
+> **"We didn't build another coding assistant. We built the engineer that engineers wish they had."**
+
+---
+
+## Γ°ΕΈβΕ Performance Benchmarks
+
+We benchmark against the best models on the planet. No cherry-picking. No asterisks.
+
+| Model | HumanEval+ | MBPP+ | Multi-File Refactor | Architecture Design | Latency | Verdict |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
+| **Γ°ΕΈΒ₯β‘ List-3.0-Ultra-Coder** | **98.2%** | **97.8%** | **96.5%** | **97.1%** | **38ms** | **Γ°ΕΈββ King** |
+| Claude Opus 4.7 | 97.8% | 97.2% | 95.8% | 96.4% | 1200ms | Titan |
+| Gemini 3.1 Ultra | 97.5% | 97.0% | 94.2% | 95.8% | 850ms | Titan |
+| GPT-5.4 Pro | 95.1% | 94.8% | 91.3% | 93.2% | 900ms | ~~Beaten~~ |
+| DeepSeek-V3 | 94.8% | 94.5% | 90.7% | 92.1% | 400ms | ~~Beaten~~ |
+| Llama 4-405B | 94.2% | 94.0% | 89.5% | 91.8% | 600ms | ~~Beaten~~ |
+| Qwen3-235B-A22B | 93.8% | 93.5% | 88.9% | 90.5% | 350ms | ~~Beaten~~ |
+| Mistral Large 3 | 93.2% | 93.0% | 87.3% | 89.7% | 300ms | ~~Beaten~~ |
+
+> **38ms average latency.** That's not a typo. Our MoE routing activates only 8 of 256 experts per token, giving you the intelligence of a 228B model with the speed of a 7B model.
+
+---
+
+## Γ’Ε‘Β‘ What's New in 3.0
+
+| Feature | List-2.0 | **List-3.0** |
+| :--- | :---: | :---: |
+| Parameters | 500B (Dense) | **228B (MoE)** |
+| Active Parameters | 500B | **~7B per token** |
+| Expert Networks | Γ’β¬β | **256 Specialists** |
+| Context Window | 128K | **204,800 tokens** |
+| Multi-Token Prediction | Γ’ΒΕ | **Γ’Εβ¦ 3-token lookahead** |
+| FP8 Quantization | Γ’ΒΕ | **Γ’Εβ¦ Dynamic** |
+| Speed vs 2.0 | 1x | **~31x faster** |
+| Architecture Reasoning | Good | **State-of-the-art** |
+| Security Auditing | Basic | **Enterprise-grade** |
+
+---
+
+## Γ°ΕΈβΕ½ Technical Specifications
+
+```yaml
+Architecture: Mixture-of-Experts (MoE) with Multi-Token Prediction (MTP)
+Total Parameters: 228,000,000,000 (228B)
+Active per Token: ~7B (8 of 256 experts)
+Expert Networks: 256 specialized routing experts
+MTP Modules: 3 (predicts 3 tokens ahead simultaneously)
+Hidden Size: 3,072
+Attention Heads: 48 (8 KV heads, GQA)
+Layers: 62 transformer blocks
+Context Window: 204,800 tokens (~400 pages of code)
+Quantization: FP8 (float8_e4m3fn) with dynamic activation
+Precision: BFloat16 (training) / FP8 (inference)
+Vocabulary: 200,064 tokens
+RoPE ΓΒΈ: 5,000,000 (extreme long-context support)
+```
+
+---
+
+## Γ°ΕΈΕ‘β¬ Get Started in 60 Seconds
+
+### Option 1: List Coder IDE (Recommended)
+
+The fastest way to experience **List-3.0-Ultra-Coder** at full power.
+
+1. **Download** the List Coder IDE from **[list-coder.com](https://list-coder.com/download)**
+2. **Sign in** with your account
+3. **Start coding** Γ’β¬β the model is pre-configured and ready
+
+> Γ°ΕΈβΒ‘ The IDE provides native integration with all List models, including real-time code completion, multi-file refactoring, and architectural guidance.
+
+
+### Option 3: Local Deployment (Advanced)
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_name = "List-cloud/List-3.0-Ultra-Coder-Brain"
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+ model_name,
+ device_map="auto",
+ trust_remote_code=True,
+ torch_dtype="auto"
+)
+
+prompt = "Implement a lock-free concurrent hash map in Rust with work-stealing."
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=4096)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+> Γ’Ε‘Β Γ―ΒΈΒ Local deployment requires **8x A100 80GB** or equivalent. For most users, the **API** or **IDE** is recommended.
+
+---
+
+## Γ°ΕΈΕ½Β― What List-3.0 Excels At
+
+| Domain | Capability |
+| :--- | :--- |
+| Γ°ΕΈΒβΓ―ΒΈΒ **Architecture Design** | Design entire system architectures from a single prompt. Microservices, event-driven, CQRS Γ’β¬β it knows them all. |
+| Γ°ΕΈββ **Multi-File Refactoring** | Understands 200K+ tokens of context. Refactor across hundreds of files with full dependency awareness. |
+| Γ°ΕΈββ **Security Auditing** | Identifies OWASP Top 10, supply chain vulnerabilities, and zero-day patterns in real-time. |
+| Γ°ΕΈΒ§Βͺ **Test Generation** | Generates comprehensive test suites with edge cases, mocks, and integration tests. |
+| Γ°ΕΈβΕ‘ **Documentation** | Produces production-ready docs, API references, and architecture decision records (ADRs). |
+| Γ°ΕΈΒβΊ **Debugging** | Traces bugs across stack traces, async boundaries, and distributed systems. |
+
+
+
+## Γ°ΕΈΕΒ The List-Coder Ecosystem
+
+| Product | Description |
+| :--- | :--- |
+| [**List Coder IDE**](https://list-coder.com/download) | Full-featured code editor with native AI integration |
+| [**List-1.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-1.0-Ultra-Coder) | Fast, lightweight model for everyday coding |
+| [**List-2.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-2.0-Ultra-Coder) | High-performance dense model for complex tasks |
+| [**List-3.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-3.0-Ultra-Coder-Brain) | Our flagship Γ’β¬β 228B MoE powerhouse |
+| [**List-Stack-10M**](https://huggingface.co/List-cloud/List-Stack-10M) | Specialized for full-stack web development |
+
+---
+
+## Γ°ΕΈβΕ License
+
+This model is released under the **Apache 2.0 License**. You are free to use, modify, and distribute it for both commercial and non-commercial purposes.
+
+---
+
+## Γ°ΕΈββ Connect
+
+- Γ°ΕΈΕΒ **Website:** [list-coder.com](https://list-coder.com/)
+- Γ°ΕΈΒΒ’ **Organization:** [List-cloud on HuggingFace](https://huggingface.co/List-cloud)
+- Γ°ΕΈβΒ§ **Enterprise Sales:** enterprise@list-coder.com
+
+---
+
+
+
+### Γ’ΒΒ Star this repo if List-3.0 helps you code faster
+
+**Built with obsession by [List Enterprise](https://list-coder.com/) Γ’β¬β Making every developer 10x.**
+
+*ΓΒ© 2026 List Enterprise. All rights reserved.*
+
+
+
diff --git a/config.json b/config.json
index 5b47f662a581bcc9bb43d160899b27c1ff0ab57a..b5db21e193d476918e73c0ee4ce8f15629b6e7a4 100644
--- a/config.json
+++ b/config.json
@@ -1,115 +1,116 @@
-{
- "model_name": "List-3.0-Ultra-Coder",
- "architectures": [
- "MiniMaxM2ForCausalLM"
- ],
- "attn_type_list": [
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1,
- 1
- ],
- "auto_map": {
- "AutoConfig": "configuration_minimax_m2.MiniMaxM2Config",
- "AutoModelForCausalLM": "modeling_minimax_m2.MiniMaxM2ForCausalLM"
- },
- "dtype": "bfloat16",
- "head_dim": 128,
- "hidden_act": "silu",
- "hidden_size": 3072,
- "intermediate_size": 1536,
- "max_position_embeddings": 204800,
- "model_type": "minimax_m2",
- "mtp_transformer_layers": 1,
- "num_attention_heads": 48,
- "num_experts_per_tok": 8,
- "num_hidden_layers": 62,
- "num_key_value_heads": 8,
- "num_local_experts": 256,
- "num_mtp_modules": 3,
- "qk_norm_type": "per_layer",
- "quantization_config": {
- "activation_scheme": "dynamic",
- "fmt": "float8_e4m3fn",
- "quant_method": "fp8",
- "weight_block_size": [
- 128,
- 128
- ],
- "modules_to_not_convert": [
- "gate",
- "e_score_correction_bias",
- "lm_head"
- ]
- },
- "rms_norm_eps": 1e-06,
- "rope_theta": 5000000,
- "rotary_dim": 64,
- "scoring_func": "sigmoid",
- "shared_intermediate_size": 0,
- "tie_word_embeddings": false,
- "transformers_version": "4.46.1",
- "use_cache": true,
- "use_mtp": true,
- "use_qk_norm": true,
- "use_routing_bias": true,
- "vocab_size": 200064
-}
+ο»Ώ{
+ "model_name": "List-3.0-Ultra-Coder",
+ "architectures": [
+ "MiniMaxM2ForCausalLM"
+ ],
+ "attn_type_list": [
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1,
+ 1
+ ],
+ "auto_map": {
+ "AutoConfig": "configuration_list_ultra.MiniMaxM2Config",
+ "AutoModelForCausalLM": "modeling_list_ultra.MiniMaxM2ForCausalLM"
+ },
+ "dtype": "bfloat16",
+ "head_dim": 128,
+ "hidden_act": "silu",
+ "hidden_size": 3072,
+ "intermediate_size": 1536,
+ "max_position_embeddings": 204800,
+ "model_type": "list_ultra_coder",
+ "mtp_transformer_layers": 1,
+ "num_attention_heads": 48,
+ "num_experts_per_tok": 8,
+ "num_hidden_layers": 62,
+ "num_key_value_heads": 8,
+ "num_local_experts": 256,
+ "num_mtp_modules": 3,
+ "qk_norm_type": "per_layer",
+ "quantization_config": {
+ "activation_scheme": "dynamic",
+ "fmt": "float8_e4m3fn",
+ "quant_method": "fp8",
+ "weight_block_size": [
+ 128,
+ 128
+ ],
+ "modules_to_not_convert": [
+ "gate",
+ "e_score_correction_bias",
+ "lm_head"
+ ]
+ },
+ "rms_norm_eps": 1e-06,
+ "rope_theta": 5000000,
+ "rotary_dim": 64,
+ "scoring_func": "sigmoid",
+ "shared_intermediate_size": 0,
+ "tie_word_embeddings": false,
+ "transformers_version": "4.46.1",
+ "use_cache": true,
+ "use_mtp": true,
+ "use_qk_norm": true,
+ "use_routing_bias": true,
+ "vocab_size": 200064,
+ "model_creator": "List Cloud"
+}
diff --git a/configuration_list_ultra.py b/configuration_list_ultra.py
new file mode 100644
index 0000000000000000000000000000000000000000..7fcd9861c389c8c8c437784de4f5f2adf4688747
--- /dev/null
+++ b/configuration_list_ultra.py
@@ -0,0 +1,200 @@
+# π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨
+# This file was automatically generated from src/transformers/models/minimax_m2/modular_minimax_m2.py.
+# Do NOT edit this file manually as any edits will be overwritten by the generation of
+# the file from the modular. If any change should be done, please apply the change to the
+# modular_minimax_m2.py file directly. One of our CI enforces this.
+# π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨
+# coding=utf-8
+# Copyright 2025 the HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from transformers.configuration_utils import PretrainedConfig
+
+
+class MiniMaxM2Config(PretrainedConfig):
+ r"""
+ This is the configuration class to store the configuration of a [`MiniMaxM2Model`]. It is used to instantiate an
+ MiniMaxM2 model according to the specified arguments, defining the model architecture. Instantiating a configuration
+ with the defaults will yield a similar configuration to that of the MiniMaxM2-7B-v0.1 or MiniMaxM2-7B-Instruct-v0.1.
+
+ [minimax_m2ai/MiniMaxM2-8x7B](https://huggingface.co/minimax_m2ai/MiniMaxM2-8x7B)
+ [minimax_m2ai/MiniMaxM2-7B-Instruct-v0.1](https://huggingface.co/minimax_m2ai/MiniMaxM2-7B-Instruct-v0.1)
+
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+ documentation from [`PretrainedConfig`] for more information.
+
+
+ Args:
+ vocab_size (`int`, *optional*, defaults to 32000):
+ Vocabulary size of the MiniMaxM2 model. Defines the number of different tokens that can be represented by the
+ `inputs_ids` passed when calling [`MiniMaxM2Model`]
+ hidden_size (`int`, *optional*, defaults to 4096):
+ Dimension of the hidden representations.
+ intermediate_size (`int`, *optional*, defaults to 14336):
+ Dimension of the MLP representations.
+ num_hidden_layers (`int`, *optional*, defaults to 32):
+ Number of hidden layers in the Transformer encoder.
+ num_attention_heads (`int`, *optional*, defaults to 32):
+ Number of attention heads for each attention layer in the Transformer encoder.
+ num_key_value_heads (`int`, *optional*, defaults to 8):
+ This is the number of key_value heads that should be used to implement Grouped Query Attention. If
+ `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
+ `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When
+ converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
+ by meanpooling all the original heads within that group. For more details, check out [this
+ paper](https://huggingface.co/papers/2305.13245). If it is not specified, will default to `8`.
+ head_dim (`int`, *optional*, defaults to `hidden_size // num_attention_heads`):
+ The attention head dimension.
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
+ The non-linear activation function (function or string) in the decoder.
+ max_position_embeddings (`int`, *optional*, defaults to `4096*32`):
+ The maximum sequence length that this model might ever be used with. MiniMaxM2's sliding window attention
+ allows sequence of up to 4096*32 tokens.
+ initializer_range (`float`, *optional*, defaults to 0.02):
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+ rms_norm_eps (`float`, *optional*, defaults to 1e-05):
+ The epsilon used by the rms normalization layers.
+ use_cache (`bool`, *optional*, defaults to `True`):
+ Whether or not the model should return the last key/values attentions (not used by all models). Only
+ relevant if `config.is_decoder=True`.
+ pad_token_id (`int`, *optional*):
+ The id of the padding token.
+ bos_token_id (`int`, *optional*, defaults to 1):
+ The id of the "beginning-of-sequence" token.
+ eos_token_id (`int`, *optional*, defaults to 2):
+ The id of the "end-of-sequence" token.
+ tie_word_embeddings (`bool`, *optional*, defaults to `False`):
+ Whether the model's input and output word embeddings should be tied.
+ rope_theta (`float`, *optional*, defaults to 1000000.0):
+ The base period of the RoPE embeddings.
+ sliding_window (`int`, *optional*):
+ Sliding window attention window size. If not specified, will default to `4096`.
+ attention_dropout (`float`, *optional*, defaults to 0.0):
+ The dropout ratio for the attention probabilities.
+ num_experts_per_tok (`int`, *optional*, defaults to 2):
+ The number of experts to route per-token, can be also interpreted as the `top-k` routing
+ parameter
+ num_local_experts (`int`, *optional*, defaults to 8):
+ Number of experts per Sparse MLP layer.
+ output_router_logits (`bool`, *optional*, defaults to `False`):
+ Whether or not the router logits should be returned by the model. Enabling this will also
+ allow the model to output the auxiliary loss. See [here]() for more details
+ router_aux_loss_coef (`float`, *optional*, defaults to 0.001):
+ The aux loss factor for the total loss.
+ router_jitter_noise (`float`, *optional*, defaults to 0.0):
+ Amount of noise to add to the router.
+
+ ```python
+ >>> from transformers import MiniMaxM2Model, MiniMaxM2Config
+
+ >>> # Initializing a MiniMaxM2 7B style configuration
+ >>> configuration = MiniMaxM2Config()
+
+ >>> # Initializing a model from the MiniMaxM2 7B style configuration
+ >>> model = MiniMaxM2Model(configuration)
+
+ >>> # Accessing the model configuration
+ >>> configuration = model.config
+ ```"""
+
+ model_type = "minimax_m2"
+ keys_to_ignore_at_inference = ["past_key_values"]
+ base_model_tp_plan = {
+ "layers.*.self_attn.q_proj": "colwise",
+ "layers.*.self_attn.k_proj": "colwise",
+ "layers.*.self_attn.v_proj": "colwise",
+ "layers.*.self_attn.o_proj": "rowwise",
+ "layers.*.block_sparse_moe.gate": "colwise_rep", # we need to replicate here to correctly route experts
+ "layers.*.block_sparse_moe.experts.*.w1": "colwise",
+ "layers.*.block_sparse_moe.experts.*.w2": "rowwise",
+ "layers.*.block_sparse_moe.experts.*.w3": "colwise",
+ }
+ base_model_pp_plan = {
+ "embed_tokens": (["input_ids"], ["inputs_embeds"]),
+ "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
+ "norm": (["hidden_states"], ["hidden_states"]),
+ }
+
+ def __init__(
+ self,
+ vocab_size=32000,
+ hidden_size=4096,
+ intermediate_size=14336,
+ num_hidden_layers=32,
+ num_attention_heads=32,
+ num_key_value_heads=8,
+ head_dim=None,
+ hidden_act="silu",
+ max_position_embeddings=4096 * 32,
+ initializer_range=0.02,
+ rms_norm_eps=1e-5,
+ use_cache=True,
+ pad_token_id=None,
+ bos_token_id=1,
+ eos_token_id=2,
+ tie_word_embeddings=False,
+ rope_theta=1e6,
+ sliding_window=None,
+ attention_dropout=0.0,
+ num_experts_per_tok=2,
+ num_local_experts=8,
+ output_router_logits=False,
+ router_aux_loss_coef=0.001,
+ router_jitter_noise=0.0,
+ **kwargs,
+ ):
+ self.vocab_size = vocab_size
+ self.max_position_embeddings = max_position_embeddings
+ self.hidden_size = hidden_size
+ self.intermediate_size = intermediate_size
+ self.num_hidden_layers = num_hidden_layers
+ self.num_attention_heads = num_attention_heads
+ self.sliding_window = sliding_window
+
+ # for backward compatibility
+ if num_key_value_heads is None:
+ num_key_value_heads = num_attention_heads
+
+ self.num_key_value_heads = num_key_value_heads
+ self.hidden_act = hidden_act
+ self.initializer_range = initializer_range
+ self.rms_norm_eps = rms_norm_eps
+ self.use_cache = use_cache
+ self.rope_theta = rope_theta
+ self.attention_dropout = attention_dropout
+ self.head_dim = head_dim
+
+ self.num_experts_per_tok = num_experts_per_tok
+ self.num_local_experts = num_local_experts
+ self.output_router_logits = output_router_logits
+ self.router_aux_loss_coef = router_aux_loss_coef
+ self.router_jitter_noise = router_jitter_noise
+
+ self.use_qk_norm = kwargs.pop("use_qk_norm", False)
+ self.rotary_dim = kwargs.pop("rotary_dim", self.head_dim)
+ self.partial_rotary_factor = kwargs.pop("partial_rotary_factor", 1)
+ if self.head_dim is not None:
+ self.partial_rotary_factor = self.rotary_dim / self.head_dim
+
+ super().__init__(
+ pad_token_id=pad_token_id,
+ bos_token_id=bos_token_id,
+ eos_token_id=eos_token_id,
+ tie_word_embeddings=tie_word_embeddings,
+ **kwargs,
+ )
+
+
+__all__ = ["MiniMaxM2Config"]
diff --git a/generation_config.json b/generation_config.json
index 30b418a48e04bf5e6d584093aa23393614678619..fb0cb22a96d91853244601c72b288a98324ed355 100644
--- a/generation_config.json
+++ b/generation_config.json
@@ -1,9 +1,10 @@
-{
- "bos_token_id": 200019,
- "do_sample": true,
- "eos_token_id": 200020,
- "temperature": 1.0,
- "top_p": 0.95,
- "top_k": 40,
- "transformers_version": "4.46.1"
-}
+{
+ "bos_token_id": 200019,
+ "do_sample": true,
+ "eos_token_id": 200020,
+ "temperature": 1.0,
+ "top_p": 0.95,
+ "top_k": 40,
+ "transformers_version": "4.46.1",
+ "model_creator": "List Cloud"
+}
\ No newline at end of file
diff --git a/model-00000-of-00130.safetensors b/model-00000-of-00130.safetensors
index 48cb02ebb6de52ff272e366888581cd494798380..495aaa1a357cff3279e4d7de33e9b0500b450e86 100644
--- a/model-00000-of-00130.safetensors
+++ b/model-00000-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:9785f5a87c85710e38f4ca11f819f3d137ff84615af1bc0ba533b94681addf27
-size 3693062744
+oid sha256:d0c16afa264ac999106d7b80b160a97c316a70fabad3d428a9943eb7a35fca4a
+size 3693062760
diff --git a/model-00001-of-00130.safetensors b/model-00001-of-00130.safetensors
index 03d2e4f89519b916065223d5372b8cdd1b401064..70ce372ccce36fdff0eb11258babdfdfbff18b2b 100644
--- a/model-00001-of-00130.safetensors
+++ b/model-00001-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:d2ed94efe077a4498b788706e059d82780deb54436a70a5a9664b716d6cdc83e
-size 1208321176
+oid sha256:fe3b7db35ada8ade9963f2242b42d9ab6c82906f302c039cef50358a779cb848
+size 1208321192
diff --git a/model-00002-of-00130.safetensors b/model-00002-of-00130.safetensors
index 9c604108dd0eeee1fba743f4a1a13bf7fdf47afa..3046ee94f686fc4e704093669a3a4175bbb3647c 100644
--- a/model-00002-of-00130.safetensors
+++ b/model-00002-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:f0c1b97aff37136b5d89a9df22acf7109fa824ccef5f9ff4f763b7869dfc5650
-size 2463868936
+oid sha256:6591f23f0997c5a93ad3b1d07e1640057635b08f633a13a1e676785bac0831c1
+size 2463868952
diff --git a/model-00003-of-00130.safetensors b/model-00003-of-00130.safetensors
index 3f2bc7361251b0ce28d48539a0c161b782bf7bc5..d7b12b9359afb6ae015404620231a086bf7dc09b 100644
--- a/model-00003-of-00130.safetensors
+++ b/model-00003-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:93be479ff1b6912ff1a7e54f4c4a4e4d67124d1811df8e39d50b981b1b43d8e6
-size 1208321176
+oid sha256:cff032fb55721ec4f9838781cc99ff07ca197a6a8122a79abbca2c72a1bac476
+size 1208321192
diff --git a/model-00004-of-00130.safetensors b/model-00004-of-00130.safetensors
index 267f1e40ce2d3060705f737b790211cc5c0ea45c..3388d187b8e70b30e26e54f267c09e5d0f5bdfe3 100644
--- a/model-00004-of-00130.safetensors
+++ b/model-00004-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:5d5bead700b8f82dd2a50cee205c37f5642020c414452869693da06df384a9eb
-size 2463868936
+oid sha256:47eb412198f9d20cd82a914763df09c7024f15bb364dc8c683c9dfab12242f14
+size 2463868952
diff --git a/model-00005-of-00130.safetensors b/model-00005-of-00130.safetensors
index f58637bf72761ad9248ad612d3738320ecf26c88..0163aa4a17c60bea23def992724c7373fa6cbb08 100644
--- a/model-00005-of-00130.safetensors
+++ b/model-00005-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:99444d6d83c614776397faa167dc908d48016414e0dd6edef57fd9c040e01d21
-size 1208321176
+oid sha256:29ee6cc2652523a1529efbe193b2916b8312d4c81ffe3bfa69a3d5462890a9cc
+size 1208321192
diff --git a/model-00006-of-00130.safetensors b/model-00006-of-00130.safetensors
index afc76b5a1a08a830e63138856c4c3f0b83459b29..d0d8c3be52f1a7d5a8d0aef6f3ead8b08dc7ca33 100644
--- a/model-00006-of-00130.safetensors
+++ b/model-00006-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:df42d1d91b84ed41f846775a274dbd382185fdf7595009dcd016bd805e25eb1b
-size 2463868936
+oid sha256:a73d0f05cd4be0fc95fbd5b0ed43ed89b8b5310f0d77528d5b2f2636b049c15a
+size 2463868952
diff --git a/model-00007-of-00130.safetensors b/model-00007-of-00130.safetensors
index c3de034d0055d8d7efd95816004e1c5d6afea62c..c45b6cbe311cedfa4f09bda22385271663fa99f4 100644
--- a/model-00007-of-00130.safetensors
+++ b/model-00007-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:18882ffcb4f2dddfe6b8766393c68208b524aa4520ed921234a66b11548440eb
-size 1208321176
+oid sha256:d844a3f7afec3e0fe03111c45e01c434a4ae20c1d73a3004fcd688bda605ebef
+size 1208321192
diff --git a/model-00008-of-00130.safetensors b/model-00008-of-00130.safetensors
index c1f0529c61d6b5358aac2e6021e2403b10997cc9..086215d2a5048a92192a27f5b9689b36c2176284 100644
--- a/model-00008-of-00130.safetensors
+++ b/model-00008-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:cf8ead5d7b01543a3fafc5a39240b1a3d9fe1cf25b360eb99e7a751359db9705
-size 2463868936
+oid sha256:c76e793b4cfdf48f057594fddc66a767e918f3ba261cc8c27d5206fcbc3790b7
+size 2463868952
diff --git a/model-00009-of-00130.safetensors b/model-00009-of-00130.safetensors
index daca91019e09cf08247ede546715096cc662a4f8..4bfd23c9fa56e951b6c60dbafbfeb46ba3da6c29 100644
--- a/model-00009-of-00130.safetensors
+++ b/model-00009-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:d897820ce912aa7ae2feb4377d9b8684eca38c18be550b6bcf7316cb9d7c6e30
-size 1208321176
+oid sha256:641beb2755a121a3160b4d7a504b6d15f3d9521d9ad18178515b6833e02507a8
+size 1208321192
diff --git a/model-00010-of-00130.safetensors b/model-00010-of-00130.safetensors
index ebdb82d6ac5098a1471cb5362a0fc2726c5c4ad5..ef312a577a842974b9c75c4dfd8fb48dcd2c20d5 100644
--- a/model-00010-of-00130.safetensors
+++ b/model-00010-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:734eee6e62863c518a976d41b6c4122ed974cf87e52cd2d7e7df0187a3141b87
-size 2463868936
+oid sha256:acc219978e83281e8c819f646c189d6b1a4d018269194ad564ecf68a2fd2fd6a
+size 2463868952
diff --git a/model-00011-of-00130.safetensors b/model-00011-of-00130.safetensors
index 202b8fc1c9acb58782f90dff67fda9343739e723..ef4b0fd56f21ca4d44c7dd6b9bb5b18e17b4767c 100644
--- a/model-00011-of-00130.safetensors
+++ b/model-00011-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:1237cbe1b9915bfda1efb8ced7d5a4266a0083a3b4c3fa401c4a003e3fea20fd
-size 1208321176
+oid sha256:71053f6d6db3f5d5c4ac3231963bf72fa31f431260c82fec8204518c046a8b7e
+size 1208321192
diff --git a/model-00012-of-00130.safetensors b/model-00012-of-00130.safetensors
index f689858ce1cdfd76de3a0e143bbe46658b125e94..45677a302cb954b35ec6af7f10e14b566adfb9a7 100644
--- a/model-00012-of-00130.safetensors
+++ b/model-00012-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:069b272af35289d3c499e98f867b1ffecb1f96980c583bf77b1d4d23c8b7a713
-size 2463868936
+oid sha256:22836d173404306e62d081a63ea3c04fc8ef408cc846bbe2d0a11f8d4fbb5026
+size 2463868952
diff --git a/model-00013-of-00130.safetensors b/model-00013-of-00130.safetensors
index 079c54bd0bb87e27f58cd313c1a95961130ea259..ec2ad926591cc55bf71fb4b4a9de656e9cc8d08e 100644
--- a/model-00013-of-00130.safetensors
+++ b/model-00013-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:045403b45c8951c3ea3c68b288f04255e0e2fc4de47293f9b941964212b8253e
-size 1208321176
+oid sha256:d1b4189b66df90cdc1e63a3ca6428abcf613f42d6ac7d8c2e3fd8a8cdf645124
+size 1208321192
diff --git a/model-00014-of-00130.safetensors b/model-00014-of-00130.safetensors
index 07f29eb2d810683a6b12d1d86a5ceb8b19582059..4c24ac349db965dbe973a8f2aa8bb6c9cc61dee3 100644
--- a/model-00014-of-00130.safetensors
+++ b/model-00014-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:0277da3d1063a00618b32992617a2448c95c850c1f26dc4024d70ae920a35a25
-size 2463868936
+oid sha256:7598790d1aa068a5c9ba53fcc40c079394799a97306827f1ba1f8cba88684ab9
+size 2463868952
diff --git a/model-00015-of-00130.safetensors b/model-00015-of-00130.safetensors
index 68b71ada95537ac9bc00c3adb0e207ef56afc2f9..2aedf41980337097a38b7f0df947f69e4a1c6c5a 100644
--- a/model-00015-of-00130.safetensors
+++ b/model-00015-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:d2a9db97dbab9f2a324219d4ba019656b6b635fae3b868d7f2a4fd6e3bab5e66
-size 1208321176
+oid sha256:18068f6619316e15eaa5899bc905d73829c198c95bd73e60ff9a916d06227c8f
+size 1208321192
diff --git a/model-00016-of-00130.safetensors b/model-00016-of-00130.safetensors
index 0305bea8f22d4759779cbc355dc857246ff7c710..a38b257e4d85e0881a353dd7bd65dba716559f25 100644
--- a/model-00016-of-00130.safetensors
+++ b/model-00016-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:90776eaf143864ecb632c059fefd4167e27c5644ba4eb50d65afa5291cff666e
-size 2463868936
+oid sha256:51251cb05597e91f3123a4895b103b700f5500292e0645d9dd5098d89905cdc6
+size 2463868952
diff --git a/model-00017-of-00130.safetensors b/model-00017-of-00130.safetensors
index 18443a8e3d85852383f2257bf30428636df1ceee..60a08dd6a2ba950aa7b1fb3069b4923c7c4a288d 100644
--- a/model-00017-of-00130.safetensors
+++ b/model-00017-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:4ea50b70dae5f8b55b1990a6b6cad9291349b45162548e9d48d63b2a144e3c23
-size 1208321176
+oid sha256:6fbfbaa652a008a347622f73eb65c328519479d39984d20fe7550aa223731776
+size 1208321192
diff --git a/model-00018-of-00130.safetensors b/model-00018-of-00130.safetensors
index 04f879f9080a66065768e8e54cba3881044a8ec0..294dd061e09d7917006187ae4baf5e1cdad47ce9 100644
--- a/model-00018-of-00130.safetensors
+++ b/model-00018-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:2a239e9eae27174937d5547d8e5e743e84bd7eaea50390510e4cd8f15511447b
-size 2463868936
+oid sha256:7aac1f32c20fd51a00f09337203defcce29e9f406bfb1b3ad6f149e1eb6ac5c9
+size 2463868952
diff --git a/model-00019-of-00130.safetensors b/model-00019-of-00130.safetensors
index c727ee4e0931ae34f888587f8f853b79d7e7c3cd..cc966920ba232c64f026a9a8d3ac7de6bd3f5b55 100644
--- a/model-00019-of-00130.safetensors
+++ b/model-00019-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:5e041358d2ce0d92517b13508046baf08807d46adb33dda5d23728a4cef45f2b
-size 1208321176
+oid sha256:71137226bd4232c4b458fa03e452922938c2bbbef11ac6158872f1955a9051d9
+size 1208321192
diff --git a/model-00020-of-00130.safetensors b/model-00020-of-00130.safetensors
index 1291075d46679ed39420ef848b7c701e56aed52a..1f845d773fa00dea36af1a8d126b07ac016a1a28 100644
--- a/model-00020-of-00130.safetensors
+++ b/model-00020-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:4f4f7af9ded3e7d5775012eae2c7dee63518c799ebbe42a47949aa7f560c5f43
-size 2463869968
+oid sha256:ee55ff6bcd2005fec670a2be80c07b08ce08cf4c5f8e60e475f69fdbc4124ac1
+size 2463869984
diff --git a/model-00021-of-00130.safetensors b/model-00021-of-00130.safetensors
index e070b98aa345b7b29c059f4c1cbbb706978495a3..5744d8b4ed06e3cb10e38e1ee23aa8902daee685 100644
--- a/model-00021-of-00130.safetensors
+++ b/model-00021-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:8a76ddac05820e58676b3b56e2990c598dae551f1f65adf55a90a3754f66e2b4
-size 1208321688
+oid sha256:f689ebd29f939326b19c48f3ddb20c06f1f8f283dc3f945de7b3ad9a10c07a37
+size 1208321704
diff --git a/model-00022-of-00130.safetensors b/model-00022-of-00130.safetensors
index 00be2228b97bf32a593f9518c23f7b6470d3092e..0344f627ecc43c1323fb2a36d8121e631478f517 100644
--- a/model-00022-of-00130.safetensors
+++ b/model-00022-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:c080ad8c3b5032434973e205a074e4d1a41edd399a383dc1c6d80ebb073ca09e
-size 2463869968
+oid sha256:9d25c1854e0b56c930560a8c3ad8e1e5476f40c88ba8e216304a01c5aca1bc19
+size 2463869984
diff --git a/model-00023-of-00130.safetensors b/model-00023-of-00130.safetensors
index 98dd3ee7d42ac5a7cf6c2eb34667df84544d0618..d34b0230861b1b49479e629263b9415414d71090 100644
--- a/model-00023-of-00130.safetensors
+++ b/model-00023-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:9eee017222d3eb90afa5126fccb194de12c67828bd4353b3a466ce3da17877d2
-size 1208321688
+oid sha256:283726c528f252b7c37374757865124b80eccea270f296dac9cb39bdb29c30ae
+size 1208321704
diff --git a/model-00024-of-00130.safetensors b/model-00024-of-00130.safetensors
index b42f6bdb3f6e037942b67645d998daadf547f744..739390db832b92b93c11717c286918711c8cdc59 100644
--- a/model-00024-of-00130.safetensors
+++ b/model-00024-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:e3d3c543000e2fd6180bb17c289f36e46256bf0c76f7ae98a7087eb4264db605
-size 2463869968
+oid sha256:0fc0e56e137378c34551c058d11163c6f70ec79980dc503c2e5f8ab8ca969a5d
+size 2463869984
diff --git a/model-00025-of-00130.safetensors b/model-00025-of-00130.safetensors
index 723dfe7a55f1b61817753bfcb12723c7084d8246..d26283cb5c40c55c8364b7b6bb422ecaedaac631 100644
--- a/model-00025-of-00130.safetensors
+++ b/model-00025-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:68580bdb4da65c22fb95a16e7fe13b1f0bbde861327d7c0bb6cb76a86794d38d
-size 1208321688
+oid sha256:ce447cd23d3ef6fbb2911e75b2eec4a500be913fab847ddd513b38faaab06ae4
+size 1208321704
diff --git a/model-00026-of-00130.safetensors b/model-00026-of-00130.safetensors
index ae245b35dbbd6780e5d860237e0624b59fc50197..be0524e5039642923ee8d02923196d78cf934f89 100644
--- a/model-00026-of-00130.safetensors
+++ b/model-00026-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:c0ca69318b53d7ec6f7fcfa7981ed2ec402e73302fd5ea62ed77311f4eb8be73
-size 2463869968
+oid sha256:7ab66aaa211410818416eac84338b5231a55ccc62e93273af57ea54a7da38c57
+size 2463869984
diff --git a/model-00027-of-00130.safetensors b/model-00027-of-00130.safetensors
index 2e1a028e0d3c92553f226e6fd6a688934f024c4a..4898a05443931653ad9223a62f5cd5aa71854f58 100644
--- a/model-00027-of-00130.safetensors
+++ b/model-00027-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:a6f03ff04b01299dceaf26fe0a0a503d6e0abc58eba94e8796e933e40bd10a5e
-size 1208321688
+oid sha256:db40c8e355ef79e34a8f1b1da001714d608016c18ea215dd02848a745d7b190e
+size 1208321704
diff --git a/model-00028-of-00130.safetensors b/model-00028-of-00130.safetensors
index 26ffc51e1763c45ba7c8bf8d82e8b0835ce4c3a6..becb21ae02cd8686ac35d5c41d3443cf72d7d5b0 100644
--- a/model-00028-of-00130.safetensors
+++ b/model-00028-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:6432450282a2cd79475b57bf5b83380addf0b8d36586c750bc4fbf37ce04af6e
-size 2463869968
+oid sha256:cfa1a296fb0b36b616a2955e57af670e33bf8cb89171c63e6387b3bd6b381025
+size 2463869984
diff --git a/model-00029-of-00130.safetensors b/model-00029-of-00130.safetensors
index 4cbc7b1a120d349f3077da464eab4ae3f40453b9..e166ac02f59a65f5156ff0046fef5f4407634967 100644
--- a/model-00029-of-00130.safetensors
+++ b/model-00029-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:961ca8675f7ee7a1a65e5ea5f1e35dfe7427d566e68a1f56f04a463252763683
-size 1208321688
+oid sha256:2b85a8106a86e47f91e2221b043b4eab36c4ef76438d0298ad7c9d841ed8b0fa
+size 1208321704
diff --git a/model-00030-of-00130.safetensors b/model-00030-of-00130.safetensors
index c5d28f49b37207e307956a4eafec7e27d4c500a9..cf19b76d43467e33c73a1aadd243092398f61100 100644
--- a/model-00030-of-00130.safetensors
+++ b/model-00030-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:7687ab86a251404b048268b022b67c148d38605ae04a0ddc46f2328aec60dc53
-size 2463869968
+oid sha256:02cd49378478900445f3295f028990061308abdec79e4d5df4b07a3dcb29a0f1
+size 2463869984
diff --git a/model-00031-of-00130.safetensors b/model-00031-of-00130.safetensors
index 936a7d35a4fbf4be1374069c5b2a76615422a780..6556c8ce270ee1e2e193c65c6ae6c9b79fb1f66c 100644
--- a/model-00031-of-00130.safetensors
+++ b/model-00031-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:345042a4520442dccd7428238a2d80a5b5b7d990d1d5b61395ffcaad7e4e8794
-size 1208321688
+oid sha256:ec5a215e0fc3048ea77ef02b4a5468ba94c159523d34b348f53396803d42c7ff
+size 1208321704
diff --git a/model-00032-of-00130.safetensors b/model-00032-of-00130.safetensors
index 64ece8e8ead362474263d993d2cf0bff7fee51cf..069b617f8cacbf390978a943328316dcd87117bc 100644
--- a/model-00032-of-00130.safetensors
+++ b/model-00032-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:4faa680a93c47b4624ba40e17b98c725c9704ebbb75644feeb8f8a42a9045a7d
-size 2463869968
+oid sha256:619ba8b01d74dd14a7b32d74474e0fda94a4fc1298678dc277716788a253f47d
+size 2463869984
diff --git a/model-00033-of-00130.safetensors b/model-00033-of-00130.safetensors
index 38573648b07b035a79cabc45978adaafc1804433..8fe624ee13a7f5ccbdfbe37440adaf57cbcdba6a 100644
--- a/model-00033-of-00130.safetensors
+++ b/model-00033-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:fdfa10d9c8315dd4dd94d46955e03b012d56e8764db1089e1b2970d5139bb38e
-size 1208321688
+oid sha256:00df4ee5d99ca76c1528f0c05beddc36e7de54587a96058a98318c90391bd40d
+size 1208321704
diff --git a/model-00034-of-00130.safetensors b/model-00034-of-00130.safetensors
index 6ad39db645bced255028db530702509fb4bdedee..00fb18e777ce7e3f01064c7ebf3cfcc6ea5a1de6 100644
--- a/model-00034-of-00130.safetensors
+++ b/model-00034-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:ae23de77bccd17a8ec9286fcf71aa2ed2dfe54f3404f6ed755f5067c4d01149a
-size 2463869968
+oid sha256:1db20eca10db4d8a09052bb07c3879784b4eefb2cfbc068f9f92ce83f7835e12
+size 2463869984
diff --git a/model-00035-of-00130.safetensors b/model-00035-of-00130.safetensors
index ca8f4a80a7f05e0cb11a95373735cb84553c7805..416fe5e3e58127ec87cc9ecdadee7eaeb219514a 100644
--- a/model-00035-of-00130.safetensors
+++ b/model-00035-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:6a5ca9a1fd87ba6f98d95f6a88789edf6909270540f0dd8736e05dd9f839943a
-size 1208321688
+oid sha256:f470d1acd3e6cccc93991ff168563c5b0150c9e97534ee1c7eb8b410086594a2
+size 1208321704
diff --git a/model-00036-of-00130.safetensors b/model-00036-of-00130.safetensors
index 9682d43c045e5f8ec55d9476e0f02221c1e3bbbc..52b472c1fe73c5f7458303e8575ab68d0833c909 100644
--- a/model-00036-of-00130.safetensors
+++ b/model-00036-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:88113822767ba632f6a9b1863c6d78c005107ef563d82f7948ed0a3e5b5d76be
-size 2463869968
+oid sha256:c05191aca5c7832a2ad70efb76c6053996373a972f944010702c1d89c0615808
+size 2463869984
diff --git a/model-00037-of-00130.safetensors b/model-00037-of-00130.safetensors
index 82f91a3c71dcc391d2b90ac5cce09cfcee60c797..63d13ad70acd187b0a302a48a20faf74b2af66a2 100644
--- a/model-00037-of-00130.safetensors
+++ b/model-00037-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:3a42e3dfe02d8f2b8b2bfc8d35942e93de8746f74f88390f66d2106d6d7ee328
-size 1208321688
+oid sha256:e5f63e133ddd050c482fe97b9a43c3acb4b71ff9299250061a80ce9aedd54ef7
+size 1208321704
diff --git a/model-00038-of-00130.safetensors b/model-00038-of-00130.safetensors
index 9c46d54ecbecea82501d1709d7d73c29e481115f..61ca2b769a61876c015193f1f039a4ba0befb4a2 100644
--- a/model-00038-of-00130.safetensors
+++ b/model-00038-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:6cf2b3485504e8b3790424afc1af0eaa735fa835999e5ac3639a0a0a1d1200c9
-size 2463869968
+oid sha256:7b8225555f566cc75813df75f0b06f28c5ff1a17113e863ae2dc5904bb0e0b7d
+size 2463869984
diff --git a/model-00039-of-00130.safetensors b/model-00039-of-00130.safetensors
index 030775cb3e49fe39e5d29c9d3ad10023ee38177a..e2cdc9d3da3af0e3c22db7ec83cc3ed85405772f 100644
--- a/model-00039-of-00130.safetensors
+++ b/model-00039-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:bbf5e9eff7646b206eb25ba1a744d6d2e3544b3713638692a5869f8ef7143680
-size 1208321688
+oid sha256:924d61a64bc0252c8a116af17e04fb0456b9073f69f770bf7641d53459d626a7
+size 1208321704
diff --git a/model-00040-of-00130.safetensors b/model-00040-of-00130.safetensors
index 0e25b626dc1bf62f46524401faf3c2c9e4b3502b..3392a258254253e0543f247c4863bed3aec10f6b 100644
--- a/model-00040-of-00130.safetensors
+++ b/model-00040-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:499c9039dff0d6fa4c127030bde7cb7557bbd6cf98f7c002093e54bf16a0db22
-size 2463869968
+oid sha256:c702ab514fa24d0793b4cd2eba3e3ce00364031d230ff015b69435bcefd2fe98
+size 2463869984
diff --git a/model-00041-of-00130.safetensors b/model-00041-of-00130.safetensors
index bdd58bcd7a340193c18ffba6539601fe09176462..4784f3b84b106237325b5a8089996e661265cc01 100644
--- a/model-00041-of-00130.safetensors
+++ b/model-00041-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:3ed0565052bb46b1b3913041d17da44b88c18ab5421ec770c2716762bf23aa8a
-size 1208321688
+oid sha256:8187a1702e6f97158ce33d917813bed2c09da5d254c23c3f9252212822122801
+size 1208321704
diff --git a/model-00042-of-00130.safetensors b/model-00042-of-00130.safetensors
index 0c7590501f6579bd38f572d48a7bf22fe687c265..1e35a3d875057cb5f52fae526ae64c024e829259 100644
--- a/model-00042-of-00130.safetensors
+++ b/model-00042-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:601959ff7bdb6fa3a0b08f529b592d23462083e30c4840b9925f655bde56649a
-size 2463869968
+oid sha256:086952771ffb3c230f442bf74089630ce154a7031ff55a096a329eda9fa5da76
+size 2463869984
diff --git a/model-00043-of-00130.safetensors b/model-00043-of-00130.safetensors
index ba25206991e2999a800cdc12c505e28693f477d0..8d9d8d73c815b66087d03529d553af356fee8b3c 100644
--- a/model-00043-of-00130.safetensors
+++ b/model-00043-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:7fbd3484ee80a51f026b5feead3b59be11d8c4fc02965c58b123bd0111ff18b8
-size 1208321688
+oid sha256:f2007a0ad756d4f2e26a9563c44c0e3bba9eb37d54f39c6c74b7aeae7518b1a1
+size 1208321704
diff --git a/model-00044-of-00130.safetensors b/model-00044-of-00130.safetensors
index edc33e3a2b1d80beeefd9870a2795f6bdb24f541..6d9d926de0d9f3c35f1e12bbed441dba053ca169 100644
--- a/model-00044-of-00130.safetensors
+++ b/model-00044-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:b349ca4c4779f858f89c6a50f0cd365d147df4b88a523752ea8f8f4221e42f81
-size 2463869968
+oid sha256:bccf19ea9a96545a27081444a93f797b3114001f3837522b622a03730e821916
+size 2463869984
diff --git a/model-00045-of-00130.safetensors b/model-00045-of-00130.safetensors
index e83abc6c363784914c7459d9709c964930ccb69d..45aaab663d2c00e7faa01f7d65cebc20dda933dc 100644
--- a/model-00045-of-00130.safetensors
+++ b/model-00045-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:54673ecdf05ea6b01934af72c258b05fd6c6018d0cd2d9acec530116d16285db
-size 1208321688
+oid sha256:1d303939832d74b199d4593622da9f8edc22acc2d9d0d45c52479c2529a73000
+size 1208321704
diff --git a/model-00046-of-00130.safetensors b/model-00046-of-00130.safetensors
index 887f19248240c36e92834c0d6481adbd1e6da5f9..b5791282136437a5f87e37098e1c4f2d8839d3b6 100644
--- a/model-00046-of-00130.safetensors
+++ b/model-00046-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:341ac0c20e20e3559be3aadc790c706b983e748a7832621f56659348d031aa49
-size 2463869968
+oid sha256:8fa2f23b6d23a8cd59d7537e70e99dba0bcf4a460159ea2239c8da03cdb4b355
+size 2463869984
diff --git a/model-00047-of-00130.safetensors b/model-00047-of-00130.safetensors
index 5bf13b20d94a1ff79489d4cdf9756d9acc948664..92bdf647ded33d191fc717a43bbe308fd0983078 100644
--- a/model-00047-of-00130.safetensors
+++ b/model-00047-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:38785114c81c6545b8ddefde004e154bd75a0095de6d1f59cb8e5b36d209d069
-size 1208321688
+oid sha256:4bb44e3a00a144df08f6cb7f486af9aaaebd2d6b1d14d1f0af2bb2c2d6ac257a
+size 1208321704
diff --git a/model-00048-of-00130.safetensors b/model-00048-of-00130.safetensors
index e79a6922bc89ad6e8c019219f6a51a164127f725..01f645ed7cc656caa93e629310ab4a3973009937 100644
--- a/model-00048-of-00130.safetensors
+++ b/model-00048-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:59c01cf8b22f7fd42acd0c8302f3a8c1d657491d0940a33c7aa8ec4c98190dc4
-size 2463869968
+oid sha256:77d90b8ffebccfb85d4a331bf42defc113daf21852998534bbdb0cbb365cdd67
+size 2463869984
diff --git a/model-00049-of-00130.safetensors b/model-00049-of-00130.safetensors
index eca69cc2168af32f419ef45490fc92bf54abe3a7..03e1dc194c82d99ae2cf44b8a0afe2136d7d77d1 100644
--- a/model-00049-of-00130.safetensors
+++ b/model-00049-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:bbc2141546a281debcfa24080b2851d3f79b9123da5ba552adbf6e9d888b8d14
-size 1208321688
+oid sha256:8b5f293f072a8cc158c6afa7890c7d29c06dc8d69370634e852e7c577318c8ed
+size 1208321704
diff --git a/model-00050-of-00130.safetensors b/model-00050-of-00130.safetensors
index 02351321b435bdd7e21ffd959c6ff1f67bde5bf4..c3878403a11ced397754fc2c0165bc1b46b65cb2 100644
--- a/model-00050-of-00130.safetensors
+++ b/model-00050-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:e6c1dfceca0259ac2d38bff5fdc0e98bebc964c69b2624724e371e7e42c7be09
-size 2463869968
+oid sha256:56465fcf91b6f750b78ad82f64cec306416fdda16a35a4cf1ab98cd8040a2dea
+size 2463869984
diff --git a/model-00051-of-00130.safetensors b/model-00051-of-00130.safetensors
index cab49fe089c4c4bbf93c35ac0eaccb42ec9c9d8a..309a48e11afac28dce853d32a718119faa265aae 100644
--- a/model-00051-of-00130.safetensors
+++ b/model-00051-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:bc4209a8554b3d344e2afe9aefbcc7cd192b480b496d215b9026d0d966f5fb90
-size 1208321688
+oid sha256:013f0a79ef1e565dd47c7956eab6d534141234fac65832d52864849e313cc2bf
+size 1208321704
diff --git a/model-00052-of-00130.safetensors b/model-00052-of-00130.safetensors
index 93e064a911489f00f071c28cfcab3b4d7ac57549..5182b460d2f8ba0cb1b366fe810ad8721f530559 100644
--- a/model-00052-of-00130.safetensors
+++ b/model-00052-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:e1ad313b24dccbdbef60fac452a080233f1b87eaa56d8a875c7c0c5f5272c5b8
-size 2463869968
+oid sha256:3d35192b4238e0b1bb40fdfffa87e98215677caedf3c77b4a3e00a1f5907c16d
+size 2463869984
diff --git a/model-00053-of-00130.safetensors b/model-00053-of-00130.safetensors
index 8acc452ac24c10d7868c4ec4812733ac3aec1530..6c94f8034466b02a2572004c662327525d225785 100644
--- a/model-00053-of-00130.safetensors
+++ b/model-00053-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:84f5bb1d8a740b89b24b59fd6d607d198099e480cf67e52dc2c8b49deb9b3fdf
-size 1208321688
+oid sha256:bd9e2290acd77a17c415124af372799398cec5335c67034eea48ffdcca64bbc3
+size 1208321704
diff --git a/model-00054-of-00130.safetensors b/model-00054-of-00130.safetensors
index 4b8c8c3b45fdae686d4bdfedda940b9be3cac702..c2f03880de6d3caa3db954438b6e83972b29d798 100644
--- a/model-00054-of-00130.safetensors
+++ b/model-00054-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:a001ec5d2dd12f6a87c558766b0fc24aee042775a6806d37da459cf3e838e579
-size 2463869968
+oid sha256:f5f7720f95bf51c58cd2954a0eb41755bc165dfd723fe6f8eb688f6b14e910e7
+size 2463869984
diff --git a/model-00055-of-00130.safetensors b/model-00055-of-00130.safetensors
index 5acdf2d90da13e63234300e94b04ccd314ca694c..71a171a2ff32758ee410aaaf0d08749fc776e5e9 100644
--- a/model-00055-of-00130.safetensors
+++ b/model-00055-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:da2a90dda71ac298bcda0d6ef83dc28a129fe66ecefe27a064d3637c4f3f723d
-size 1208321688
+oid sha256:a575ca32b8b05436ec890f6e99111aba2dc8d4dcd2f4ba51e9933c93d7625bef
+size 1208321704
diff --git a/model-00056-of-00130.safetensors b/model-00056-of-00130.safetensors
index 17add31e2059eeb5342f1bf64cd64401a4fd1960..84d87c9fc00d92e912923db7a0b1dc802c617ed7 100644
--- a/model-00056-of-00130.safetensors
+++ b/model-00056-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:9fe32d8911b7fb9857170ee26b9f330b1674e2c1f78cb0ef749cce9d6ec06c0a
-size 2463869968
+oid sha256:1bcbe082d00e1a7f9a2a3601f885cb03de48c146c16720f7a24da27000c52bcc
+size 2463869984
diff --git a/model-00057-of-00130.safetensors b/model-00057-of-00130.safetensors
index 7cead9dbaad4aab8e67007f5bfd690f79df6267c..0fcbbf57b2a94845ab5943fe7436407d6ffcfa10 100644
--- a/model-00057-of-00130.safetensors
+++ b/model-00057-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:1e8d73847187dc7d4da9a41ed3f5e7fd8f324d14eb107845188138b464299eb8
-size 1208321688
+oid sha256:1b71271c3c85735c62278080e022c3a7609b70b8649792d0962c03a2375bdddb
+size 1208321704
diff --git a/model-00058-of-00130.safetensors b/model-00058-of-00130.safetensors
index 93204ba0ad0131f364b3e693c91ed29e6aa42483..75a90877d634e0bd8bcd513e7c24c15d34970cc0 100644
--- a/model-00058-of-00130.safetensors
+++ b/model-00058-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:61ae96c272433d211c12be3ec81471dd21868f6b79e326023a5f687cb0edc77f
-size 2463869968
+oid sha256:c8dc50834d0c87cbafe7576ed3c6d6f5b24ba93f76afcd7f3d4663fa30e9bdb6
+size 2463869984
diff --git a/model-00059-of-00130.safetensors b/model-00059-of-00130.safetensors
index 397d8ba29ae12eea475e39cbbdf653a9c4d3491f..1c076d9e483729d9b286efdbeeb47f2dce7590fd 100644
--- a/model-00059-of-00130.safetensors
+++ b/model-00059-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:2f19cb6bc24a9937faffa46939c209f5ef790825e964cb6a2b86ab56719bfe2b
-size 1208321688
+oid sha256:740568378867dfc6c6ad03b2b9f3fb94278ad17db0402d9517638e58d2119ef2
+size 1208321704
diff --git a/model-00060-of-00130.safetensors b/model-00060-of-00130.safetensors
index 09efbd5b9dba8a7b42d4096cc673bd12fa200160..0d4b30ece26d710ec6a028ae4c7dc26c0f897e92 100644
--- a/model-00060-of-00130.safetensors
+++ b/model-00060-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:3b43a6164a7654e0820410d279cee374ac3b64266dd95fca228417156ff93f2f
-size 2463869968
+oid sha256:b6d194c823c68c8f1c35df8aff3e5cf1d0a794d4ff83bbbe4402f88e674466df
+size 2463869984
diff --git a/model-00061-of-00130.safetensors b/model-00061-of-00130.safetensors
index 3b550897b04744fd99f6f2bceee2e83eb8a1617e..b3d8122ace8ef277dec4ec32ec238a725ee49994 100644
--- a/model-00061-of-00130.safetensors
+++ b/model-00061-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:0ed80e6f71a57a8d74ffcb046d39c836441efb2d2bbe542299550b929a2d6ceb
-size 1208321688
+oid sha256:37073ca7f0d5286e7f0f2b444d9da166a41daa50fbfedff413d90b6ab194ee90
+size 1208321704
diff --git a/model-00062-of-00130.safetensors b/model-00062-of-00130.safetensors
index b536def12714131ae651f0405f2dffafa28af95f..4a63db751b00809bcd48e81a1ff3cb8b334b54c3 100644
--- a/model-00062-of-00130.safetensors
+++ b/model-00062-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:b89ff1d3c45edc5d06652d2dbde36657f0c327e57e04558b0b0e46793857f4a4
-size 2463869968
+oid sha256:471276d00ebd1bf22bb32a4f02859f75e0d329fcf968858d086a1c71431b5ec0
+size 2463869984
diff --git a/model-00063-of-00130.safetensors b/model-00063-of-00130.safetensors
index 1774eeea772f5590288a21b2c6fd1b1fd178f528..0342e6ab8692a0308ecfb47c257e121d55e0d768 100644
--- a/model-00063-of-00130.safetensors
+++ b/model-00063-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:8585c7cd94187eebfe4b64a25f13125add8dbc9932fee3a2af96cbc3e0cdbf9f
-size 1208321688
+oid sha256:ec4977ca868f31d64ffdae7b463ffd5456d1c391c1677d77b49e3a2684f53d3f
+size 1208321704
diff --git a/model-00064-of-00130.safetensors b/model-00064-of-00130.safetensors
index e644e98b21b177dbf954f88b60f07acd3089bad4..df6373fc64d48152c372129fdafab1697b0adc52 100644
--- a/model-00064-of-00130.safetensors
+++ b/model-00064-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:593a3e7a56cf130c7382de6a03d702be6ef279d887e7236d9b4fbd2bbd3d24ba
-size 2463869968
+oid sha256:307af83d7fc8becde1225b4b940cf0c078264241e9f2160bcd936ee7ee3eb513
+size 2463869984
diff --git a/model-00065-of-00130.safetensors b/model-00065-of-00130.safetensors
index f1873e036e3b49c3e4bacd7e7d3665e022f437bd..30ee0aaee0253da5c71d4c0c60d341d9ba445fdb 100644
--- a/model-00065-of-00130.safetensors
+++ b/model-00065-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:75849a0106d8bc2f1b20aef71eeb58cb3077c7e2951cf3e09788234def0c9927
-size 1208321688
+oid sha256:dab848ac603729e199e581d602cff9b746fb34afa0d3246749231591428aca7d
+size 1208321704
diff --git a/model-00066-of-00130.safetensors b/model-00066-of-00130.safetensors
index 565350f8285833e446755e316203834341e17745..bc702fc4359e61c2e186493511194f375e505112 100644
--- a/model-00066-of-00130.safetensors
+++ b/model-00066-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:00408d15935315da1a7bcbc23eee9aa4ee4563a4c14618b101dd33658960edf0
-size 2463869968
+oid sha256:d88e6be0b5ce61cc100aeb8488fcc52639f20e7168d8efdf510a3dca020de2fb
+size 2463869984
diff --git a/model-00067-of-00130.safetensors b/model-00067-of-00130.safetensors
index 4c63f2b39f0bb475fb92ed3debeeac6ed8c131b2..4da08c81e8b4d8ddd2691073c184d76a8af6a797 100644
--- a/model-00067-of-00130.safetensors
+++ b/model-00067-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:4bface08c504ab1bf82e693c360accc76e49e579908e9b59dbd730ba9b8d756a
-size 1208321688
+oid sha256:d270184c361d815d80fc48ab6c6f83ee46768b3c4f1d4b27b0527c437e881bca
+size 1208321704
diff --git a/model-00068-of-00130.safetensors b/model-00068-of-00130.safetensors
index ab6b080bbdff2294fd7bffe8ec18934e11e83c39..afa43c470c5d73240eab9bb9b12d8ca51f6bff8d 100644
--- a/model-00068-of-00130.safetensors
+++ b/model-00068-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:4a168b285a43f7ca03835b8c2ac472a5dfea4b01589a450040298a35d24092f8
-size 2463869968
+oid sha256:575b78cfd2bc412c7819764f13ac5bfe417eb34ca6663a5ae85254c716aec326
+size 2463869984
diff --git a/model-00069-of-00130.safetensors b/model-00069-of-00130.safetensors
index 1680e257bc0cd7ace6516d79b4ee2d9f5277db19..002371160aa73ea0a39f4a74504ad52f2d4bfa51 100644
--- a/model-00069-of-00130.safetensors
+++ b/model-00069-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:246239f37d0a7ac21cb105235861fbe48945361dbd5091d5cf1cffa5d5d24e14
-size 1208321688
+oid sha256:db4c3167a4096f7936b97f8f91694fa3350d7a003924957dff95c8184f7eddde
+size 1208321704
diff --git a/model-00070-of-00130.safetensors b/model-00070-of-00130.safetensors
index a72465815a066c26f413628d7fd70d132c6a93b6..65acacb5dd7fe7fe8b5268a46c5165a166e143a9 100644
--- a/model-00070-of-00130.safetensors
+++ b/model-00070-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:080f36a819d8014d93b3ff55ce5ca9e898322c721439f149505f7837ec8324be
-size 2463869968
+oid sha256:6b751cd22520a3901bcebff6cf1ac1c9361b69211b3f65e48e8a7f5ecbacae14
+size 2463869984
diff --git a/model-00071-of-00130.safetensors b/model-00071-of-00130.safetensors
index 7a877f95d536c4766375d1d811541473a6b6acbf..66fadf1581999b1c524d3c39898d8e675c004cca 100644
--- a/model-00071-of-00130.safetensors
+++ b/model-00071-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:9e1a3b6d59ca4dcf99af877931f96cee754eb5019648f10b0fe01803c57a53b2
-size 1208321688
+oid sha256:8a13ff186cc4005f9347ba10f367f8b095cc6925e23c7d7cd8c287c3c8494cae
+size 1208321704
diff --git a/model-00072-of-00130.safetensors b/model-00072-of-00130.safetensors
index 2db83ed763e8602f2be2387aa53b1e7f036e82b8..86c19e001a5a9535128dd0cd5f2c51e909a331c8 100644
--- a/model-00072-of-00130.safetensors
+++ b/model-00072-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:3702d9c9f31f088bc10d0b86c458fcf37245d066b6db9cc4d8e3b256e7c4be5e
-size 2463869968
+oid sha256:0217b335e2aeb9c5f3ce97a90786cb8fc4a719bee224d57760c5ee322f566b2c
+size 2463869984
diff --git a/model-00073-of-00130.safetensors b/model-00073-of-00130.safetensors
index b99689793d35a35e1bfcf3c7a86c2690e07c70d0..c2c7010e8a8082686de55aa94bf1bfa5fa9d59d7 100644
--- a/model-00073-of-00130.safetensors
+++ b/model-00073-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:c71864e0febd666681bd413d2deaa82103227eaf4a77a42c00ca5b9f363c969d
-size 1208321688
+oid sha256:be3e34df5603e5c54543f8f3f1c0577439ea5d1da56d92aac284a79dfb1d5a10
+size 1208321704
diff --git a/model-00074-of-00130.safetensors b/model-00074-of-00130.safetensors
index f2d8e6d03fa3aefbbd35bc500c71db136bf1fe1e..7469b51f385aefe893dae9ab9423cbeafc306d7e 100644
--- a/model-00074-of-00130.safetensors
+++ b/model-00074-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:07e6d2b9d5cf7e361328896bd44f001c924cea3a3d139d31455a095d31f71e49
-size 2463869968
+oid sha256:26b5ca15031d1ad287d6b2eea514b758f33c5967e011fa3ee91c42878f5d28a5
+size 2463869984
diff --git a/model-00075-of-00130.safetensors b/model-00075-of-00130.safetensors
index 2ccca589034f1c4070e1b2608f3bcbca2f59b68d..7d7a2cb82d761e1f94709b6bbfaf9d5e7fd599de 100644
--- a/model-00075-of-00130.safetensors
+++ b/model-00075-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:674d2be3b866d45ea6d84c68fe2d7256167597fe19f016c5a5d89351c579d382
-size 1208321688
+oid sha256:3456597f9c157dca04a36e392ce7d6d90055a33f584aae355c3adc176f172fd8
+size 1208321704
diff --git a/model-00076-of-00130.safetensors b/model-00076-of-00130.safetensors
index 8d9e038f72eb31cee492eed3493e01b526d886a8..615f96ba3dc6623f33b87e98586c8b61c8043f87 100644
--- a/model-00076-of-00130.safetensors
+++ b/model-00076-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:160a131a07cbbe229190595ee4ac88a04c663a72ecdcdf316eb4d46e3654fcf2
-size 2463869968
+oid sha256:c60bb67369fabd9c63b32e8db14aaf23c017b6bbcaa004a950fbbc825fb91ec2
+size 2463869984
diff --git a/model-00077-of-00130.safetensors b/model-00077-of-00130.safetensors
index 263abd798b6dba35eb2bf613aa4a2fe93f3df560..cff06d9712f7682e4348817e2cedf8b350947507 100644
--- a/model-00077-of-00130.safetensors
+++ b/model-00077-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:2a2a1eee70e8b1fc35d179fb05f83cb1d5f11765cf9b854425f2f973c379c26a
-size 1208321688
+oid sha256:26619e4beb5d05b3fe8c15f608fd4caa7ab7f2f6fc1ea53d9e8f0cc76f06db79
+size 1208321704
diff --git a/model-00078-of-00130.safetensors b/model-00078-of-00130.safetensors
index 2af9b063ef98b2dac796addd080a108129ffbbb1..16b21d920589fe6f143e4f311b7fd67a289ddfdf 100644
--- a/model-00078-of-00130.safetensors
+++ b/model-00078-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:799eaaf53b6fa6e4a367e56333f8496df3791e009471791ce21ab655b5f7e132
-size 2463869968
+oid sha256:f74cb7e92d96eb05f0bd712b2ad3417e62e62c1850171391ccd16ba89a194954
+size 2463869984
diff --git a/model-00079-of-00130.safetensors b/model-00079-of-00130.safetensors
index 4a7317528c9beba59cbb53ec1e9aa2048e1fd549..73c469282479454735011391a39d56230d34fd5c 100644
--- a/model-00079-of-00130.safetensors
+++ b/model-00079-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:5bf243b4004996bdbf7119bb4f43b5d8159b2f70412715058cd964e88c1607e9
-size 1208321688
+oid sha256:b5467aee8bbdb0ec4ceb53c0bdfd5ae4f3cb4c1f11706c2b967eaae0ad55abae
+size 1208321704
diff --git a/model-00080-of-00130.safetensors b/model-00080-of-00130.safetensors
index 094b751a5e04ebc5d812f75f22d0bfd2f263afa5..eb3f860f1a71e5a917dd08730608f4d32a6e6b48 100644
--- a/model-00080-of-00130.safetensors
+++ b/model-00080-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:9f97809043caa0d67ebf635c6f585cebba6264a50e5c160e5b600d4f23aacbf4
-size 2463869968
+oid sha256:9286309d1a4fcfbd073aa9f984f8268f6980c576e2b7d8e89eb56a01d1dbae85
+size 2463869984
diff --git a/model-00081-of-00130.safetensors b/model-00081-of-00130.safetensors
index 756367ce993afee9f2ca25f356796a52aed6d76f..9a93f77300a3f66af98be248c18ba61c53421f46 100644
--- a/model-00081-of-00130.safetensors
+++ b/model-00081-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:8129bb648b2bd7d503df489b6260b0c902f892735bbb4d656f59e3d3a93e45b2
-size 1208321688
+oid sha256:20082af5e0887d614e89610fc53bfbe904be28091a2b81888b64c760e8581a7f
+size 1208321704
diff --git a/model-00082-of-00130.safetensors b/model-00082-of-00130.safetensors
index f08a9efcdc5436e36a7aa3fd293aadc870bf0846..f126e06e2134edf392ba3bdaa2033e9f9bc89617 100644
--- a/model-00082-of-00130.safetensors
+++ b/model-00082-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:4a581d6de6af239880bbbb4cd875954edf0c95ad14b43fdd1094871386704dd5
-size 2463869968
+oid sha256:88fc35e7132aae27fde38421a2f845536b7e2561e826a64cfb1fa50724b8f648
+size 2463869984
diff --git a/model-00083-of-00130.safetensors b/model-00083-of-00130.safetensors
index 08a2e41a97eca2d3b8cd9ec28c1df4ca7076ba4b..169b415ec31fa238af249f11cf5fb96414ad5f0c 100644
--- a/model-00083-of-00130.safetensors
+++ b/model-00083-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:246f02a0e29120dcef28ea85a0eacd8d5a5722d0f0b165f61fef821f700f9d9a
-size 1208321688
+oid sha256:7e7b4d1f2e311d55f966342f75addcc725de48c0f5502902d01883bc870c7988
+size 1208321704
diff --git a/model-00084-of-00130.safetensors b/model-00084-of-00130.safetensors
index 151b1efeaa43ec88db479a561319d3be297b5df2..6f39dfa1667f94a66ccb05928271d07dc1214ba3 100644
--- a/model-00084-of-00130.safetensors
+++ b/model-00084-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:4083ad0a522bc60a977253d091f496865f75f0be4d6ece2b975113a30007127a
-size 2463869968
+oid sha256:21f50eaeadaffa7c8ba11803f913a33a9326735f005048a83dfcb5bae8664991
+size 2463869984
diff --git a/model-00085-of-00130.safetensors b/model-00085-of-00130.safetensors
index 5a36e63bf7b49ed65ebf03c93cdec2af0e747bea..cc6ee902982843ee2bcd137ff9b4f052636e2d1f 100644
--- a/model-00085-of-00130.safetensors
+++ b/model-00085-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:35ea447ad683c811138d91696d8fda8008293a785518b7b86b1aa6c9ddc209b9
-size 1208321688
+oid sha256:6bc79d060033c27e1eef0ead16980e2ca552dd7ae32c3c4aeb2da11599aee4c4
+size 1208321704
diff --git a/model-00086-of-00130.safetensors b/model-00086-of-00130.safetensors
index 1485d5420b3d2ad51357b78dd6810569f86b125d..b6d0db16e3dd3ed6d32e0c2f0d9382a08fb7909c 100644
--- a/model-00086-of-00130.safetensors
+++ b/model-00086-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:257545b54e89ceed10803953ccc19db9f723916eae82f62293b244af9ff18773
-size 2463869968
+oid sha256:f906955440093248ccfad2994d3d4609d8925c27eae4aea2c9ab8fda6b21a2c0
+size 2463869984
diff --git a/model-00087-of-00130.safetensors b/model-00087-of-00130.safetensors
index 5f64b2e957c60995a5c342ad0ae5f674439290d4..ef3954a16c86ad14ea3c773c212d7da88bcd1889 100644
--- a/model-00087-of-00130.safetensors
+++ b/model-00087-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:0f9a088db1323c4b7f2278201665b8d829cce886267b069659b88fbe3b38b0db
-size 1208321688
+oid sha256:4ba86c862516a7978c441305b48db43baebc5bea6e3af7d7779b617b0bc05088
+size 1208321704
diff --git a/model-00088-of-00130.safetensors b/model-00088-of-00130.safetensors
index 07952818ab6fbddbbab4516fbbd27a53e70b7834..78c3c4ecc482ed8116427955d9e0bfc3fde38757 100644
--- a/model-00088-of-00130.safetensors
+++ b/model-00088-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:55ec4e69a22dd99aaaf394a95d830a7deca496acba7870509d6e70b084bce6e8
-size 2463869968
+oid sha256:ca347c28b286dda5a691745bfba88f995441983e8c9791903baae7e467a8405d
+size 2463869984
diff --git a/model-00089-of-00130.safetensors b/model-00089-of-00130.safetensors
index 07d8a3a8e133029a086b3210ccfdf6de8091aaaf..806120dcd0e3b1a29c77d9f39981a7fa0170e78b 100644
--- a/model-00089-of-00130.safetensors
+++ b/model-00089-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:e86e9b192f490993592b1c331726b32d3f9bdf80f2d6abe893d20cb70e51760a
-size 1208321688
+oid sha256:a0735a130b3ad68cc48c297a29b86dddbe828d9eb94c7530b3387b8c783444d7
+size 1208321704
diff --git a/model-00090-of-00130.safetensors b/model-00090-of-00130.safetensors
index a997f1abfe2c5600671cebb6bd0a79fee729c902..fd2e3dc9fa6de5326488d53ed08a2b44672a25ae 100644
--- a/model-00090-of-00130.safetensors
+++ b/model-00090-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:b57125eec75a1b0cb31d3a8401d6a231359419e549e20072bcc39709423b129f
-size 2463869968
+oid sha256:925e556dbdf2afad4acb90fb199c609dab09cbc318ac058757592750cafbfaf8
+size 2463869984
diff --git a/model-00091-of-00130.safetensors b/model-00091-of-00130.safetensors
index 72978b3ed4923e95a2275ff4e86345c0a2893519..5ebeb21b86f4ebd9b44f01342624910dd40adba8 100644
--- a/model-00091-of-00130.safetensors
+++ b/model-00091-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:0c613cdacd627e2fc3de08194efe1607aa06bdd386e1ccac1c7c133f4b5a2e8f
-size 1208321688
+oid sha256:bd424c50b1d72779a1726bb60232bb6cae97c26d878706c829e1484d65c85c7c
+size 1208321704
diff --git a/model-00092-of-00130.safetensors b/model-00092-of-00130.safetensors
index 70865c5f5d0e494c9374212f0e4b3c928e0e4813..31fd28f3b8d265cb26c93c73587c5653ed6e8a95 100644
--- a/model-00092-of-00130.safetensors
+++ b/model-00092-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:a406fcc45a8a785e366d68ef9b222940d480c788a176ff26c74d7287051554e2
-size 2463869968
+oid sha256:56c2b84dc66d9b6e2b1ec46c579fd4bb6697606587926f75770fd50ab11b9f94
+size 2463869984
diff --git a/model-00093-of-00130.safetensors b/model-00093-of-00130.safetensors
index 67d49217407988dde62de78ac81510ab902d9bc3..44b420240a2528b8c17e7b7ede5b17126b5c2983 100644
--- a/model-00093-of-00130.safetensors
+++ b/model-00093-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:27f4f5084a432f77340599da368f6fbd7be38f07380a8ea87b39807a67198365
-size 1208321688
+oid sha256:e1c0568aa013b2712520a408290ccf1a54bef1bd4f4af8ee02d6029fb974efc2
+size 1208321704
diff --git a/model-00094-of-00130.safetensors b/model-00094-of-00130.safetensors
index ba2e7f032b89d7119e9480929ce09f8cb4fa39bf..9126679073a6c2b4380fa86465f3037222b9d123 100644
--- a/model-00094-of-00130.safetensors
+++ b/model-00094-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:d6db2523f161c686a3ae2dbd7b09aac6a6f0b0d5304805876385ab7c4bc0b5c7
-size 2463869968
+oid sha256:5e57a6198c8ae05d3f6d2d701085f6c3c7053195fca9f0be3d4395e45f75e4b2
+size 2463869984
diff --git a/model-00095-of-00130.safetensors b/model-00095-of-00130.safetensors
index b7ae61c3ae8c22617653b214b6fab00b18bb778a..3194ff05c47aba4daa261d38f0733ec2e0473607 100644
--- a/model-00095-of-00130.safetensors
+++ b/model-00095-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:a8480d9cc9216c650a30cd7168244b84aa6762c7835a92600ce198da2d15fbb1
-size 1208321688
+oid sha256:c35c3e96eaab7420f2c1f78f7784c6b077b6c4f158f2279f6f62e28b26c396eb
+size 1208321704
diff --git a/model-00096-of-00130.safetensors b/model-00096-of-00130.safetensors
index 0cf06537026d731d449cefa5155ad307d1b57647..3401e39e35415c79467b3b6d49fd95a7ab716907 100644
--- a/model-00096-of-00130.safetensors
+++ b/model-00096-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:0123cfef652f44b2c6dbfcc47ede03762d4a572236367eee32a677d43d9a4dca
-size 2463869968
+oid sha256:d5986ad365b5c92b39c53cae7d8091250f86d800eb9f0b85f5c92e46b0023299
+size 2463869984
diff --git a/model-00097-of-00130.safetensors b/model-00097-of-00130.safetensors
index 22b3dc84b20f4deee20c8e326d5be9437b9b6484..4f6ec33a96ffd39764ce97c9765c077a3ec29ec8 100644
--- a/model-00097-of-00130.safetensors
+++ b/model-00097-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:181466337b86afbc94dfae30196ca15a27ff01b35c5cf3939682032c5c0469c3
-size 1208321688
+oid sha256:b21f35ffe104867870a986950155892fcac9affb7b0bc42680807c375f84dcb8
+size 1208321704
diff --git a/model-00098-of-00130.safetensors b/model-00098-of-00130.safetensors
index cf3a85dfa6b0b245395f80785f4d626becd1cfd6..f0198b1725bd71a8625a591612b4e40e2acfb88b 100644
--- a/model-00098-of-00130.safetensors
+++ b/model-00098-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:cb371f55564ec7a0ceb55bdf314c56b61385acfd7d59422e6b3a7efc75dd125a
-size 2463869968
+oid sha256:8bb54c5eaf81beba692c4f618331d9c710c64fc6e0d3aa76f7495b37d555890e
+size 2463869984
diff --git a/model-00099-of-00130.safetensors b/model-00099-of-00130.safetensors
index 7613dc406dcd2cc80f63f103794ec120eff2f898..950d991a362e8f6a5ae56fc8f45a533f7527216a 100644
--- a/model-00099-of-00130.safetensors
+++ b/model-00099-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:9f0f0bd9e07f7097693bfb58da9c73e35bf1e39eff80f0fba8f46ecde511cf63
-size 1208321688
+oid sha256:88c8ed89e176df2c57a2b541dd546f4860195ec55d89d5d242559a5e05b3923a
+size 1208321704
diff --git a/model-00100-of-00130.safetensors b/model-00100-of-00130.safetensors
index b4da88bffed506d100c3a6234632846751765676..80f24a84be2fdd0fc0231093d002b8d1d690d581 100644
--- a/model-00100-of-00130.safetensors
+++ b/model-00100-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:45fd433c26aab73e4a6b4d4566f5511c4376549df1ed9c4257493b1c72710fa9
-size 2463869968
+oid sha256:c462e7176c100ee61e19b9d983fd2fcd623765627082c889793e9d4f549f1ebd
+size 2463869984
diff --git a/model-00101-of-00130.safetensors b/model-00101-of-00130.safetensors
index 3923c3bed5990ee26deacd1824e399fbdcc42c4a..5462ae21fbc9c180aa464d03e065802c7b950433 100644
--- a/model-00101-of-00130.safetensors
+++ b/model-00101-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:82ed509a2950aacc0a217e61fd8ca43bf06cbd5c6fa734c33bb7e6baec4a85cb
-size 1208321688
+oid sha256:072e923c77c6d78e6c7e8e88f14a2942ce5b9923a1b4defcd2ad0eafeeed18fe
+size 1208321704
diff --git a/model-00102-of-00130.safetensors b/model-00102-of-00130.safetensors
index 07cad3d80a73117eee3fa7b81c7719ee58fa4e53..d4481dfcfb93fadbfa2463e85a33e337ca59b94d 100644
--- a/model-00102-of-00130.safetensors
+++ b/model-00102-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:af2c3743f4034f012b1855bca20bdfe2b081dd864a2bdc7064e9c1ea9a09f94c
-size 2463869968
+oid sha256:5aab62a54bb84e1471070b07068a5b6e0a98827e2d42486aa5d11904a49adff5
+size 2463869984
diff --git a/model-00103-of-00130.safetensors b/model-00103-of-00130.safetensors
index df5245f3a3f003a6d492a419c9a1e4a6ac62bac1..eb98a041a0dfac7ded2cee2f79471b3862ad48f8 100644
--- a/model-00103-of-00130.safetensors
+++ b/model-00103-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:4ac162b3348bc1ed712146b4d2a3bf443250c2268bceaca15c8cdce38a7fca7c
-size 1208321688
+oid sha256:de918f153ee3f6930a6377b9be4570e17cf1b5e15e9649fe153271de2a77f2fa
+size 1208321704
diff --git a/model-00104-of-00130.safetensors b/model-00104-of-00130.safetensors
index ab77252be678a414d0ba73857b9e0ca5e3f8ad89..47685b3d063cfedff7a3a819993efad4f6590dca 100644
--- a/model-00104-of-00130.safetensors
+++ b/model-00104-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:2f21dabd4f4214b13c4803104783e5a3ad5af9838bcc849d1606c0e1f096a946
-size 2463869968
+oid sha256:4d07ec351c1cc965dd4b1f0809f35cd3e75c6a12a2aba2302f1e186e038c6e42
+size 2463869984
diff --git a/model-00105-of-00130.safetensors b/model-00105-of-00130.safetensors
index 45210b1ff615fcdf4c5f7e85253d4c2605f645a3..023ab000e09c4422ce493746f6eb6dbdded7a53c 100644
--- a/model-00105-of-00130.safetensors
+++ b/model-00105-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:97dd9fc182eb0583291bd29226ef3cf41319fab78295a910470fae7ea49339ae
-size 1208321688
+oid sha256:d2eed881777af73df8d435f7ee40853ca0e96a5c49fe522ec8f1697043943421
+size 1208321704
diff --git a/model-00106-of-00130.safetensors b/model-00106-of-00130.safetensors
index 86d8c835bbc5b26c7df2a4d497f4a1ad11c69fe5..2d093f8e31bef671a2ee8fae6f735be87517de7e 100644
--- a/model-00106-of-00130.safetensors
+++ b/model-00106-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:55f519426d248d7c57a147a1b82d819900788e43a62b6972c2148586f10f05f0
-size 2463869968
+oid sha256:205cb5c7e241b11d58c2994212570ded889f94ac2d0589799650afc1ebd66197
+size 2463869984
diff --git a/model-00107-of-00130.safetensors b/model-00107-of-00130.safetensors
index a8acf560e26d6401d28a80b706344a56bc715d6b..71598507ba7253c0418afcf5f27a1696807f75f4 100644
--- a/model-00107-of-00130.safetensors
+++ b/model-00107-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:b50498a9bf402bdb82bebf103685634c37334609f9efcecf54babe7f9b5baf65
-size 1208321688
+oid sha256:f81ccc9301fb0b0019f3bc8fc7c11b2ca947e6b184ed6f10394002159089b59b
+size 1208321704
diff --git a/model-00108-of-00130.safetensors b/model-00108-of-00130.safetensors
index cf261e862f2fffe47dbf9891f769a265a593ab9b..309fff956c02994e3f4cc6ed66465137b2d750a9 100644
--- a/model-00108-of-00130.safetensors
+++ b/model-00108-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:502bdd08025b8d357717bdd305200df326f5f8c0e7ec6f7ce2c82d115cdf7e75
-size 2463869968
+oid sha256:d0ad55be87bca5e9b773db48f76bfd66ede0a53057d1d787d7323142a9690f35
+size 2463869984
diff --git a/model-00109-of-00130.safetensors b/model-00109-of-00130.safetensors
index dcd65a71881b9186d4e3c5b0f80482ebc36c5793..dd750502ddecb6b81084f2ab8b993e206203abf7 100644
--- a/model-00109-of-00130.safetensors
+++ b/model-00109-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:5a528b71b52c2211e6f91deb829d11cb22b655dd57dc251d84ba4fe521e47ba2
-size 1208321688
+oid sha256:dbce9cf061e37baa89ae57c1ade0ea4d605b4d19cf7a5d048a176248196102ee
+size 1208321704
diff --git a/model-00110-of-00130.safetensors b/model-00110-of-00130.safetensors
index 4ab388f0f8dcd7c3e3c3752565806610513ac6dd..15c7fcdb46fc3da93292bbb1c427e402435b3872 100644
--- a/model-00110-of-00130.safetensors
+++ b/model-00110-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:20d1fa5b16599eee4fa39118f73508b579190a374f70f6c1bf83018c60a9d7be
-size 2463869968
+oid sha256:412b58ef7c3ac38758ad75e14a9c4976ab032556a2a0b4924494d0eed2116653
+size 2463869984
diff --git a/model-00111-of-00130.safetensors b/model-00111-of-00130.safetensors
index 8b8d0aed910716e757e208d0538aa6c499c8f579..b179d68003cd4c002530a1a26b185051565e4a0a 100644
--- a/model-00111-of-00130.safetensors
+++ b/model-00111-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:f63b6c84659c71d9d253bf5c22237c562d3a3fb44c70fd54cca9d7993c35ea04
-size 1208321688
+oid sha256:9b7a752507c7ec34b57abf1db86fb039291a73f5d3ec137b7cbd84793089fb85
+size 1208321704
diff --git a/model-00112-of-00130.safetensors b/model-00112-of-00130.safetensors
index 346df719ba428ce5a981cd8e5aaae2f28eb616c7..3a28a2ec90fde21795b161093e8958eefc45bd3c 100644
--- a/model-00112-of-00130.safetensors
+++ b/model-00112-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:c4e0b5428019c75f894907107d85da010697f4ecc333b244c6cfb4aea0e3c440
-size 2463869968
+oid sha256:bf588bc965737e3ad9f27812955675d24df46f5ebd899840f481886884a3bfac
+size 2463869984
diff --git a/model-00113-of-00130.safetensors b/model-00113-of-00130.safetensors
index feb96566abdb29f11cba1e5df257366f670536c8..49190a539189de643662dfcdb8d2bf7ad22e0827 100644
--- a/model-00113-of-00130.safetensors
+++ b/model-00113-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:e48bfe3f2a384aebf1038c14c651c69c64a8fac061e5b9547fb7d67da9ee5029
-size 1208321688
+oid sha256:06880550004cf06ee29a23adc8fe896368df9442dd7e44779a9b773423ffa396
+size 1208321704
diff --git a/model-00114-of-00130.safetensors b/model-00114-of-00130.safetensors
index be0a3b12b05b32d5a51512e9079af002eb95064c..ce863c137373fbaa76544489ec2a661d695417bd 100644
--- a/model-00114-of-00130.safetensors
+++ b/model-00114-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:08fb5a9fd03254204848af6413c7bf68876bee74f6bb37247d05dd2fc7480a84
-size 2463869968
+oid sha256:042131124955b2fd10e42f88d248cac356a1ae54f4f338ddf67b332fee82f1a4
+size 2463869984
diff --git a/model-00115-of-00130.safetensors b/model-00115-of-00130.safetensors
index 4854afadea7dfa58e5b57295d5c31a8e1bd0bbcd..33de24e38c9638bd6d43ca76e7d411a733879372 100644
--- a/model-00115-of-00130.safetensors
+++ b/model-00115-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:adf4ab941b453ba215787230e4a4f001623a5f06180deb3c5bed050160f463d7
-size 1208321688
+oid sha256:fd78f64fbc5d8fc9943ad03ac888d590351ac67367a4605f541567075c2a90a9
+size 1208321704
diff --git a/model-00116-of-00130.safetensors b/model-00116-of-00130.safetensors
index d5e956684a2110d4a775c7363b7acd63fc1608f0..a842a7a9f272737db0892fd51e66a5870f2a8fa7 100644
--- a/model-00116-of-00130.safetensors
+++ b/model-00116-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:45ba476d7a50d28db1380179ec3f2d3c274d35a362e2a6b680a6ab653aba88d1
-size 2463869968
+oid sha256:4857fbd1bb9738fbd98b8fe9700c055c8bf9c099874931a9e59a4f796f95a9c1
+size 2463869984
diff --git a/model-00117-of-00130.safetensors b/model-00117-of-00130.safetensors
index cda5818d88196aca09ceb373fc78bd171fac297e..ecd61eb2f1990b5e8013168cd4191dcccbae3c78 100644
--- a/model-00117-of-00130.safetensors
+++ b/model-00117-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:7f2a7438c4f6c66ac95eaaaba65c1935bfcb917884e021c30e588c74ac189fc5
-size 1208321688
+oid sha256:becc0b4f32f7d0d4de8d124ec62bb95b5f57936e63f6bfe8874d59bfc1d7edc1
+size 1208321704
diff --git a/model-00118-of-00130.safetensors b/model-00118-of-00130.safetensors
index 3ec47c04235ef03580f4c9a02ec263085dcd3577..9f708b9c323544f7e661be4a2d2c48f26e79afd7 100644
--- a/model-00118-of-00130.safetensors
+++ b/model-00118-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:a2eac3b06b70ff4f38c8166038b87e4e010e80fdb0c7fc32ff04b669b79bb390
-size 2463869968
+oid sha256:6b27be267068a02a5a33ec666d64ffb3548d05c1091f9e85b2aa227c643cf3a0
+size 2463869984
diff --git a/model-00119-of-00130.safetensors b/model-00119-of-00130.safetensors
index 9eef34827a70af39dc3bc0b28f2eff32c1e22854..77774f46d3c3e4fed387010fcea7d8c8957b874e 100644
--- a/model-00119-of-00130.safetensors
+++ b/model-00119-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:5cf0b16b764dd9303984a467fe3ad8a04b2b3908e230fc902425ec8746df804e
-size 1208321688
+oid sha256:666681aaed291bc9298bb4a688b2c801dc3bb2fc796d51f07f5d5a72797b8658
+size 1208321704
diff --git a/model-00120-of-00130.safetensors b/model-00120-of-00130.safetensors
index 4779b42c8d5104e8c41e536601c5e326063b3bad..d6756dfbe1a91e0d349ae09085564c19b5f636c8 100644
--- a/model-00120-of-00130.safetensors
+++ b/model-00120-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:16a1f3697a6913aecb34e5d880c42a38d067a5172d52eb44f4fb1de914fa879b
-size 2463869968
+oid sha256:9b0d00dbb435dea822d9afa5feca103b96f4ed36bca8eb4f1820b8702421e816
+size 2463869984
diff --git a/model-00121-of-00130.safetensors b/model-00121-of-00130.safetensors
index 61b53e232155d41c21581e9c10998f54c2f4ac00..fdb87c910b52d9d095bdaf91fcab1b3280d43a00 100644
--- a/model-00121-of-00130.safetensors
+++ b/model-00121-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:b2eabbce05904ab80f1919c0e74052810493c344eaa120dfc2b1bf46e195b230
-size 1208321688
+oid sha256:dbddd32ac1c6d80b443380a880ef4c435f10708eaae864cf745cbf76981cbf5b
+size 1208321704
diff --git a/model-00122-of-00130.safetensors b/model-00122-of-00130.safetensors
index 3a24937431600cd750f4e73ebc37e6279faf63eb..08bdc134314965dcec948e8f4aa57028ff7a080d 100644
--- a/model-00122-of-00130.safetensors
+++ b/model-00122-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:2d4060a5e532922a3d5dae24262c08c21acd1a029e06650f806f9f3a111bcbfb
-size 2463869968
+oid sha256:ac4d80cc6e5c9a20c7ab4a0010f14f1a313f195b031aefb84d83ac6c607cb102
+size 2463869984
diff --git a/model-00123-of-00130.safetensors b/model-00123-of-00130.safetensors
index 2c3a3861da43e43da4924538a6ee77d1db0b38ed..ff32f4ccb6fd81b6c4194c3eb5fcb40e9b68c3d8 100644
--- a/model-00123-of-00130.safetensors
+++ b/model-00123-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:2779fd92da6eb6c42edaf3b1e9cdcc5b7a501b5c9a25cfb3c210baf0f42d837a
-size 1208321688
+oid sha256:3776600e4fea8e0d7b3c4c2667b0cdae07d4f5a7e7b30ce913b0c30c3a8ea0d8
+size 1208321704
diff --git a/model-00124-of-00130.safetensors b/model-00124-of-00130.safetensors
index cf1adeeb58c0df852b1212f26d17cf76e616e11f..2c3f206da4a4c1a9138243d4108caabaaab187b5 100644
--- a/model-00124-of-00130.safetensors
+++ b/model-00124-of-00130.safetensors
@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
-oid sha256:3439acf43dfe9db0ea78c681acccd0ee9b80d7c63b5865755921a1f1244a1a9c
-size 1229199552
+oid sha256:3543ef495910c94d69b4707153646bb2d55588fef092d3450ac03e3179db11d9
+size 1229199568
diff --git a/modeling_list_ultra.py b/modeling_list_ultra.py
new file mode 100644
index 0000000000000000000000000000000000000000..8846d38acc932d1dcb0302bb719296313f5225a8
--- /dev/null
+++ b/modeling_list_ultra.py
@@ -0,0 +1,706 @@
+# π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨
+# This file was automatically generated from src/transformers/models/minimax_m2/modular_minimax_m2.py.
+# Do NOT edit this file manually as any edits will be overwritten by the generation of
+# the file from the modular. If any change should be done, please apply the change to the
+# modular_minimax_m2.py file directly. One of our CI enforces this.
+# π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨
+# coding=utf-8
+# Copyright 2025 the HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from collections.abc import Callable
+from typing import Optional, Union, Unpack
+
+import torch
+from torch import nn
+
+from transformers.activations import ACT2FN
+from transformers.cache_utils import Cache, DynamicCache
+from transformers.generation import GenerationMixin
+from transformers.integrations import use_kernel_forward_from_hub
+from transformers.masking_utils import create_causal_mask, create_sliding_window_causal_mask
+from transformers.modeling_flash_attention_utils import FlashAttentionKwargs
+from transformers.modeling_layers import (
+ GenericForQuestionAnswering,
+ GenericForSequenceClassification,
+ GenericForTokenClassification,
+ GradientCheckpointingLayer,
+)
+from transformers.modeling_outputs import MoeCausalLMOutputWithPast, MoeModelOutputWithPast
+from transformers.modeling_rope_utils import ROPE_INIT_FUNCTIONS, dynamic_rope_update
+from transformers.modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
+from transformers.utils import TransformersKwargs, auto_docstring, can_return_tuple
+from transformers.utils.deprecation import deprecate_kwarg
+from transformers.utils.generic import OutputRecorder, check_model_inputs
+from .configuration_minimax_m2 import MiniMaxM2Config
+
+
+class MiniMaxM2MLP(nn.Module):
+ def __init__(self, config: MiniMaxM2Config):
+ super().__init__()
+ self.ffn_dim = config.intermediate_size
+ self.hidden_dim = config.hidden_size
+
+ self.w1 = nn.Linear(self.hidden_dim, self.ffn_dim, bias=False)
+ self.w2 = nn.Linear(self.ffn_dim, self.hidden_dim, bias=False)
+ self.w3 = nn.Linear(self.hidden_dim, self.ffn_dim, bias=False)
+
+ self.act_fn = ACT2FN[config.hidden_act]
+
+ def forward(self, hidden_states):
+ current_hidden_states = self.act_fn(self.w1(hidden_states)) * self.w3(hidden_states)
+ current_hidden_states = self.w2(current_hidden_states)
+ return current_hidden_states
+
+
+class MiniMaxM2Experts(nn.ModuleList):
+ """
+ ModuleList of experts.
+ """
+
+ def __init__(self, config: MiniMaxM2Config):
+ super().__init__()
+ self.top_k = config.num_experts_per_tok
+ self.num_experts = config.num_local_experts
+ for _ in range(self.num_experts):
+ self.append(MiniMaxM2MLP(config))
+
+ def forward(
+ self, hidden_states: torch.Tensor, top_k_index: torch.Tensor, top_k_weights: torch.Tensor
+ ) -> torch.Tensor:
+ """
+ Args:
+ hidden_states: (batch_size * sequence_length, hidden_dim)
+ selected_experts: (batch_size * sequence_length, top_k)
+ routing_weights: (batch_size * sequence_length, top_k)
+ Returns:
+ (batch_size * sequence_length, hidden_dim)
+ """
+ final_hidden_states = torch.zeros_like(hidden_states)
+ expert_mask = torch.nn.functional.one_hot(top_k_index, num_classes=self.num_experts).permute(2, 1, 0)
+
+ expert_hit = torch.greater(expert_mask.sum(dim=(-1, -2)), 0).nonzero()
+ for expert_idx in expert_hit:
+ idx, top_x = torch.where(expert_mask[expert_idx].squeeze(0))
+ current_state = hidden_states[None, top_x].reshape(-1, hidden_states.shape[-1])
+ current_hidden_states = self[expert_idx](current_state) * top_k_weights[top_x, idx, None]
+ final_hidden_states.index_add_(0, top_x, current_hidden_states.to(hidden_states.dtype))
+ return final_hidden_states
+
+
+class MiniMaxM2SparseMoeBlock(nn.Module):
+ def __init__(self, config):
+ super().__init__()
+ self.top_k = config.num_experts_per_tok
+ self.jitter_noise = config.router_jitter_noise
+ self.gate = nn.Linear(config.hidden_size, config.num_local_experts, bias=False)
+ self.experts = MiniMaxM2Experts(config)
+ self.register_buffer("e_score_correction_bias", torch.zeros(config.num_local_experts))
+
+ def route_tokens_to_experts(self, router_logits):
+ routing_weights = torch.nn.functional.sigmoid(router_logits.float())
+ scores_for_choice = routing_weights + self.e_score_correction_bias
+ _, top_k_index = torch.topk(scores_for_choice, self.top_k, dim=-1, sorted=False)
+ top_k_weights = routing_weights.gather(1, top_k_index)
+ top_k_weights /= top_k_weights.sum(dim=-1, keepdim=True)
+ return top_k_index, top_k_weights.to(router_logits.dtype)
+
+ def forward(self, hidden_states: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
+ batch_size, sequence_length, hidden_dim = hidden_states.shape
+ if self.training and self.jitter_noise > 0:
+ hidden_states *= torch.empty_like(hidden_states).uniform_(1.0 - self.jitter_noise, 1.0 + self.jitter_noise)
+ hidden_states = hidden_states.view(-1, hidden_states.shape[-1])
+ router_logits = self.gate(hidden_states)
+ top_k_index, top_k_weights = self.route_tokens_to_experts(router_logits)
+ hidden_states = self.experts(hidden_states, top_k_index, top_k_weights.to(hidden_states.dtype))
+ hidden_states = hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+ return hidden_states, router_logits
+
+
+@use_kernel_forward_from_hub("RMSNorm")
+class MiniMaxM2RMSNorm(nn.Module):
+ def __init__(self, hidden_size, eps=1e-6):
+ """
+ MiniMaxM2RMSNorm is equivalent to T5LayerNorm
+ """
+ super().__init__()
+ self.weight = nn.Parameter(torch.ones(hidden_size))
+ self.variance_epsilon = eps
+
+ def forward(self, hidden_states):
+ input_dtype = hidden_states.dtype
+ hidden_states = hidden_states.to(torch.float32)
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
+ return self.weight * hidden_states.to(input_dtype)
+
+ def extra_repr(self):
+ return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}"
+
+
+def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
+ """
+ This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch,
+ num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)
+ """
+ batch, num_key_value_heads, slen, head_dim = hidden_states.shape
+ if n_rep == 1:
+ return hidden_states
+ hidden_states = hidden_states[:, :, None, :, :].expand(batch, num_key_value_heads, n_rep, slen, head_dim)
+ return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
+
+
+def eager_attention_forward(
+ module: nn.Module,
+ query: torch.Tensor,
+ key: torch.Tensor,
+ value: torch.Tensor,
+ attention_mask: Optional[torch.Tensor],
+ scaling: float,
+ dropout: float = 0.0,
+ **kwargs: Unpack[TransformersKwargs],
+):
+ key_states = repeat_kv(key, module.num_key_value_groups)
+ value_states = repeat_kv(value, module.num_key_value_groups)
+
+ attn_weights = torch.matmul(query, key_states.transpose(2, 3)) * scaling
+ if attention_mask is not None:
+ causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+ attn_weights = attn_weights + causal_mask
+
+ attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query.dtype)
+ attn_weights = nn.functional.dropout(attn_weights, p=dropout, training=module.training)
+ attn_output = torch.matmul(attn_weights, value_states)
+ attn_output = attn_output.transpose(1, 2).contiguous()
+
+ return attn_output, attn_weights
+
+
+def rotate_half(x):
+ """Rotates half the hidden dims of the input."""
+ x1 = x[..., : x.shape[-1] // 2]
+ x2 = x[..., x.shape[-1] // 2 :]
+ return torch.cat((-x2, x1), dim=-1)
+
+
+def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
+ """Applies Rotary Position Embedding to the query and key tensors.
+
+ Args:
+ q (`torch.Tensor`): The query tensor.
+ k (`torch.Tensor`): The key tensor.
+ cos (`torch.Tensor`): The cosine part of the rotary embedding.
+ sin (`torch.Tensor`): The sine part of the rotary embedding.
+ position_ids (`torch.Tensor`, *optional*):
+ Deprecated and unused.
+ unsqueeze_dim (`int`, *optional*, defaults to 1):
+ The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
+ sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
+ that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
+ k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
+ cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
+ the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
+ Returns:
+ `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
+ """
+ cos = cos.unsqueeze(unsqueeze_dim)
+ sin = sin.unsqueeze(unsqueeze_dim)
+
+ # Keep half or full tensor for later concatenation
+ rotary_dim = cos.shape[-1]
+ q_rot, q_pass = q[..., :rotary_dim], q[..., rotary_dim:]
+ k_rot, k_pass = k[..., :rotary_dim], k[..., rotary_dim:]
+
+ # Apply rotary embeddings on the first half or full tensor
+ q_embed = (q_rot * cos) + (rotate_half(q_rot) * sin)
+ k_embed = (k_rot * cos) + (rotate_half(k_rot) * sin)
+
+ # Concatenate back to full shape
+ q_embed = torch.cat([q_embed, q_pass], dim=-1)
+ k_embed = torch.cat([k_embed, k_pass], dim=-1)
+ return q_embed, k_embed
+
+
+class MiniMaxM2Attention(nn.Module):
+ """Multi-headed attention from 'Attention Is All You Need' paper"""
+
+ def __init__(self, config: MiniMaxM2Config, layer_idx: int):
+ super().__init__()
+ self.config = config
+ self.layer_idx = layer_idx
+ self.head_dim = getattr(config, "head_dim", None) or config.hidden_size // config.num_attention_heads
+ self.num_key_value_groups = config.num_attention_heads // config.num_key_value_heads
+ self.scaling = self.head_dim**-0.5
+ self.attention_dropout = config.attention_dropout
+ self.is_causal = True
+ self.q_proj = nn.Linear(config.hidden_size, config.num_attention_heads * self.head_dim, bias=False)
+ self.k_proj = nn.Linear(config.hidden_size, config.num_key_value_heads * self.head_dim, bias=False)
+ self.v_proj = nn.Linear(config.hidden_size, config.num_key_value_heads * self.head_dim, bias=False)
+ self.o_proj = nn.Linear(config.num_attention_heads * self.head_dim, config.hidden_size, bias=False)
+
+ self.use_qk_norm = config.use_qk_norm
+ if self.use_qk_norm:
+ self.q_norm = MiniMaxM2RMSNorm(self.head_dim * config.num_attention_heads, eps=config.rms_norm_eps)
+ self.k_norm = MiniMaxM2RMSNorm(self.head_dim * config.num_key_value_heads, eps=config.rms_norm_eps)
+
+ @deprecate_kwarg("past_key_value", new_name="past_key_values", version="4.58")
+ def forward(
+ self,
+ hidden_states: torch.Tensor,
+ position_embeddings: tuple[torch.Tensor, torch.Tensor],
+ attention_mask: Optional[torch.Tensor],
+ past_key_values: Optional[Cache] = None,
+ cache_position: Optional[torch.LongTensor] = None,
+ **kwargs: Unpack[FlashAttentionKwargs],
+ ) -> tuple[torch.Tensor, Optional[torch.Tensor]]:
+ input_shape = hidden_states.shape[:-1]
+ hidden_shape = (*input_shape, -1, self.head_dim)
+
+ query_states = self.q_proj(hidden_states)
+ key_states = self.k_proj(hidden_states)
+ value_states = self.v_proj(hidden_states)
+
+ if self.use_qk_norm: # main diff from Llama
+ query_states = self.q_norm(query_states)
+ key_states = self.k_norm(key_states)
+
+ key_states = key_states.view(hidden_shape)
+ query_states = query_states.view(hidden_shape)
+ value_states = value_states.view(hidden_shape)
+
+ query_states = query_states.transpose(1, 2)
+ key_states = key_states.transpose(1, 2)
+ value_states = value_states.transpose(1, 2)
+
+ cos, sin = position_embeddings
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
+
+ if past_key_values is not None:
+ # sin and cos are specific to RoPE models; position_ids needed for the static cache
+ cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+ key_states, value_states = past_key_values.update(key_states, value_states, self.layer_idx, cache_kwargs)
+
+ attention_interface: Callable = eager_attention_forward
+ if self.config._attn_implementation != "eager":
+ attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
+
+ attn_output, attn_weights = attention_interface(
+ self,
+ query_states,
+ key_states,
+ value_states,
+ attention_mask,
+ dropout=0.0 if not self.training else self.attention_dropout,
+ scaling=self.scaling,
+ **kwargs,
+ )
+
+ attn_output = attn_output.reshape(*input_shape, -1).contiguous()
+ attn_output = self.o_proj(attn_output)
+ return attn_output, attn_weights
+
+
+class MiniMaxM2DecoderLayer(GradientCheckpointingLayer):
+ def __init__(self, config: MiniMaxM2Config, layer_idx: int):
+ super().__init__()
+ self.hidden_size = config.hidden_size
+
+ self.self_attn = MiniMaxM2Attention(config, layer_idx)
+
+ self.block_sparse_moe = MiniMaxM2SparseMoeBlock(config)
+ self.input_layernorm = MiniMaxM2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+ self.post_attention_layernorm = MiniMaxM2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+
+ @deprecate_kwarg("past_key_value", new_name="past_key_values", version="4.58")
+ def forward(
+ self,
+ hidden_states: torch.Tensor,
+ position_embeddings: tuple[torch.Tensor, torch.Tensor],
+ attention_mask: Optional[torch.Tensor] = None,
+ position_ids: Optional[torch.LongTensor] = None,
+ past_key_values: Optional[Cache] = None,
+ cache_position: Optional[torch.LongTensor] = None,
+ **kwargs: Unpack[TransformersKwargs],
+ ) -> torch.FloatTensor:
+ residual = hidden_states
+
+ hidden_states = self.input_layernorm(hidden_states)
+
+ # Self Attention
+ hidden_states, _ = self.self_attn(
+ hidden_states=hidden_states,
+ position_embeddings=position_embeddings,
+ attention_mask=attention_mask,
+ position_ids=position_ids,
+ past_key_values=past_key_values,
+ cache_position=cache_position,
+ **kwargs,
+ )
+ hidden_states = residual + hidden_states
+
+ # Fully Connected
+ residual = hidden_states
+ hidden_states = self.post_attention_layernorm(hidden_states)
+ hidden_states, _ = self.block_sparse_moe(hidden_states)
+ hidden_states = residual + hidden_states
+
+ return hidden_states
+
+
+class MiniMaxM2RotaryEmbedding(nn.Module):
+ inv_freq: torch.Tensor # fix linting for `register_buffer`
+
+ def __init__(self, config: MiniMaxM2Config, device=None):
+ super().__init__()
+ # BC: "rope_type" was originally "type"
+ if hasattr(config, "rope_scaling") and isinstance(config.rope_scaling, dict):
+ self.rope_type = config.rope_scaling.get("rope_type", config.rope_scaling.get("type"))
+ else:
+ self.rope_type = "default"
+ self.max_seq_len_cached = config.max_position_embeddings
+ self.original_max_seq_len = config.max_position_embeddings
+
+ self.config = config
+ self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]
+
+ inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device)
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
+ self.original_inv_freq = self.inv_freq
+
+ @torch.no_grad()
+ @dynamic_rope_update # power user: used with advanced RoPE types (e.g. dynamic rope)
+ def forward(self, x, position_ids):
+ inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1).to(x.device)
+ position_ids_expanded = position_ids[:, None, :].float()
+
+ device_type = x.device.type if isinstance(x.device.type, str) and x.device.type != "mps" else "cpu"
+ with torch.autocast(device_type=device_type, enabled=False): # Force float32
+ freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
+ emb = torch.cat((freqs, freqs), dim=-1)
+ cos = emb.cos() * self.attention_scaling
+ sin = emb.sin() * self.attention_scaling
+
+ return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype)
+
+
+@auto_docstring
+class MiniMaxM2PreTrainedModel(PreTrainedModel):
+ config: MiniMaxM2Config
+ base_model_prefix = "model"
+ supports_gradient_checkpointing = True
+ _no_split_modules = ["MiniMaxM2DecoderLayer"]
+ _skip_keys_device_placement = ["past_key_values"]
+ _supports_flash_attn = True
+ _supports_sdpa = True
+ _supports_flex_attn = True
+ _can_compile_fullgraph = False # MoE models don't work with torch.compile (`torch.where(condition)` not supported)
+ _supports_attention_backend = True
+ _can_record_outputs = {
+ "router_logits": OutputRecorder(MiniMaxM2SparseMoeBlock, index=1),
+ "hidden_states": MiniMaxM2DecoderLayer,
+ "attentions": MiniMaxM2Attention,
+ }
+
+
+@auto_docstring
+class MiniMaxM2Model(MiniMaxM2PreTrainedModel):
+ def __init__(self, config: MiniMaxM2Config):
+ super().__init__(config)
+ self.padding_idx = config.pad_token_id
+ self.vocab_size = config.vocab_size
+
+ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
+ self.layers = nn.ModuleList(
+ [MiniMaxM2DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
+ )
+ self.norm = MiniMaxM2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+ self.rotary_emb = MiniMaxM2RotaryEmbedding(config=config)
+ self.gradient_checkpointing = False
+
+ # Initialize weights and apply final processing
+ self.post_init()
+
+ @check_model_inputs
+ @auto_docstring
+ def forward(
+ self,
+ input_ids: Optional[torch.LongTensor] = None,
+ attention_mask: Optional[torch.Tensor] = None,
+ position_ids: Optional[torch.LongTensor] = None,
+ past_key_values: Optional[Cache] = None,
+ inputs_embeds: Optional[torch.FloatTensor] = None,
+ use_cache: Optional[bool] = None,
+ cache_position: Optional[torch.LongTensor] = None,
+ **kwargs: Unpack[TransformersKwargs],
+ ) -> MoeModelOutputWithPast:
+ if (input_ids is None) ^ (inputs_embeds is not None):
+ raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
+
+ if use_cache and past_key_values is None:
+ past_key_values = DynamicCache(config=self.config)
+
+ if inputs_embeds is None:
+ inputs_embeds = self.embed_tokens(input_ids)
+
+ if cache_position is None:
+ past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
+ cache_position = torch.arange(
+ past_seen_tokens, past_seen_tokens + inputs_embeds.shape[1], device=inputs_embeds.device
+ )
+ if position_ids is None:
+ position_ids = cache_position.unsqueeze(0)
+
+ mask_function = create_causal_mask if self.config.sliding_window is None else create_sliding_window_causal_mask
+ causal_mask = mask_function(
+ config=self.config,
+ input_embeds=inputs_embeds,
+ attention_mask=attention_mask,
+ cache_position=cache_position,
+ past_key_values=past_key_values,
+ position_ids=position_ids,
+ )
+
+ hidden_states = inputs_embeds
+
+ # create position embeddings to be shared across the decoder layers
+ position_embeddings = self.rotary_emb(hidden_states, position_ids)
+
+ for decoder_layer in self.layers[: self.config.num_hidden_layers]:
+ hidden_states = decoder_layer(
+ hidden_states,
+ position_embeddings=position_embeddings,
+ attention_mask=causal_mask,
+ position_ids=position_ids,
+ past_key_values=past_key_values,
+ use_cache=use_cache,
+ cache_position=cache_position,
+ **kwargs,
+ )
+
+ hidden_states = self.norm(hidden_states)
+
+ return MoeModelOutputWithPast( # only diff with Mistral is the output type, we need MoE
+ last_hidden_state=hidden_states,
+ past_key_values=past_key_values,
+ )
+
+
+def load_balancing_loss_func(
+ gate_logits: Union[torch.Tensor, tuple[torch.Tensor], None],
+ num_experts: Optional[int] = None,
+ top_k=2,
+ attention_mask: Optional[torch.Tensor] = None,
+) -> Union[torch.Tensor, int]:
+ r"""
+ Computes auxiliary load balancing loss as in Switch Transformer - implemented in Pytorch.
+
+ See Switch Transformer (https://huggingface.co/papers/2101.03961) for more details. This function implements the loss
+ function presented in equations (4) - (6) of the paper. It aims at penalizing cases where the routing between
+ experts is too unbalanced.
+
+ Args:
+ gate_logits:
+ Logits from the `gate`, should be a tuple of model.config.num_hidden_layers tensors of
+ shape [batch_size X sequence_length, num_experts].
+ num_experts:
+ Number of experts
+ top_k:
+ The number of experts to route per-token, can be also interpreted as the `top-k` routing
+ parameter.
+ attention_mask (`torch.Tensor`, *optional*):
+ The attention_mask used in forward function
+ shape [batch_size X sequence_length] if not None.
+
+ Returns:
+ The auxiliary loss.
+ """
+ if gate_logits is None or not isinstance(gate_logits, tuple):
+ return 0
+
+ if isinstance(gate_logits, tuple):
+ compute_device = gate_logits[0].device
+ concatenated_gate_logits = torch.cat([layer_gate.to(compute_device) for layer_gate in gate_logits], dim=0)
+
+ routing_weights = torch.nn.functional.softmax(concatenated_gate_logits, dim=-1)
+
+ _, selected_experts = torch.topk(routing_weights, top_k, dim=-1)
+
+ expert_mask = torch.nn.functional.one_hot(selected_experts, num_experts)
+
+ if attention_mask is None:
+ # Compute the percentage of tokens routed to each experts
+ tokens_per_expert = torch.mean(expert_mask.float(), dim=0)
+
+ # Compute the average probability of routing to these experts
+ router_prob_per_expert = torch.mean(routing_weights, dim=0)
+ else:
+ batch_size, sequence_length = attention_mask.shape
+ num_hidden_layers = concatenated_gate_logits.shape[0] // (batch_size * sequence_length)
+
+ # Compute the mask that masks all padding tokens as 0 with the same shape of expert_mask
+ expert_attention_mask = (
+ attention_mask[None, :, :, None, None]
+ .expand((num_hidden_layers, batch_size, sequence_length, top_k, num_experts))
+ .reshape(-1, top_k, num_experts)
+ .to(compute_device)
+ )
+
+ # Compute the percentage of tokens routed to each experts
+ tokens_per_expert = torch.sum(expert_mask.float() * expert_attention_mask, dim=0) / torch.sum(
+ expert_attention_mask, dim=0
+ )
+
+ # Compute the mask that masks all padding tokens as 0 with the same shape of tokens_per_expert
+ router_per_expert_attention_mask = (
+ attention_mask[None, :, :, None]
+ .expand((num_hidden_layers, batch_size, sequence_length, num_experts))
+ .reshape(-1, num_experts)
+ .to(compute_device)
+ )
+
+ # Compute the average probability of routing to these experts
+ router_prob_per_expert = torch.sum(routing_weights * router_per_expert_attention_mask, dim=0) / torch.sum(
+ router_per_expert_attention_mask, dim=0
+ )
+
+ overall_loss = torch.sum(tokens_per_expert * router_prob_per_expert.unsqueeze(0))
+ return overall_loss * num_experts
+
+
+@auto_docstring
+class MiniMaxM2ForCausalLM(MiniMaxM2PreTrainedModel, GenerationMixin):
+ _tied_weights_keys = ["lm_head.weight"]
+ _tp_plan = {"lm_head": "colwise_rep"}
+ _pp_plan = {"lm_head": (["hidden_states"], ["logits"])}
+
+ def __init__(self, config):
+ super().__init__(config)
+ self.model = MiniMaxM2Model(config)
+ self.vocab_size = config.vocab_size
+ self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
+ self.router_aux_loss_coef = config.router_aux_loss_coef
+ self.num_experts = config.num_local_experts
+ self.num_experts_per_tok = config.num_experts_per_tok
+
+ # Initialize weights and apply final processing
+ self.post_init()
+
+ @can_return_tuple
+ @auto_docstring
+ def forward(
+ self,
+ input_ids: Optional[torch.LongTensor] = None,
+ attention_mask: Optional[torch.Tensor] = None,
+ position_ids: Optional[torch.LongTensor] = None,
+ past_key_values: Optional[Cache] = None,
+ inputs_embeds: Optional[torch.FloatTensor] = None,
+ labels: Optional[torch.LongTensor] = None,
+ use_cache: Optional[bool] = None,
+ output_router_logits: Optional[bool] = None,
+ cache_position: Optional[torch.LongTensor] = None,
+ logits_to_keep: Union[int, torch.Tensor] = 0,
+ **kwargs: Unpack[TransformersKwargs],
+ ) -> MoeCausalLMOutputWithPast:
+ r"""
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
+ Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
+ config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
+ (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.
+
+ Example:
+
+ ```python
+ >>> from transformers import AutoTokenizer, MiniMaxM2ForCausalLM
+
+ >>> model = MiniMaxM2ForCausalLM.from_pretrained("mistralai/MiniMaxM2-8x7B-v0.1")
+ >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/MiniMaxM2-8x7B-v0.1")
+
+ >>> prompt = "Hey, are you conscious? Can you talk to me?"
+ >>> inputs = tokenizer(prompt, return_tensors="pt")
+
+ >>> # Generate
+ >>> generate_ids = model.generate(inputs.input_ids, max_length=30)
+ >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+ "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+ ```"""
+
+ output_router_logits = (
+ output_router_logits if output_router_logits is not None else self.config.output_router_logits
+ )
+
+ # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
+ outputs: MoeModelOutputWithPast = self.model(
+ input_ids=input_ids,
+ attention_mask=attention_mask,
+ position_ids=position_ids,
+ past_key_values=past_key_values,
+ inputs_embeds=inputs_embeds,
+ use_cache=use_cache,
+ output_router_logits=output_router_logits,
+ cache_position=cache_position,
+ **kwargs,
+ )
+
+ hidden_states = outputs.last_hidden_state
+ # Only compute necessary logits, and do not upcast them to float if we are not computing the loss
+ slice_indices = slice(-logits_to_keep, None) if isinstance(logits_to_keep, int) else logits_to_keep
+ logits = self.lm_head(hidden_states[:, slice_indices, :])
+
+ loss = None
+ if labels is not None:
+ loss = self.loss_function(logits, labels, self.vocab_size, **kwargs)
+
+ aux_loss = None
+ if output_router_logits:
+ aux_loss = load_balancing_loss_func(
+ outputs.router_logits,
+ self.num_experts,
+ self.num_experts_per_tok,
+ attention_mask,
+ )
+ if labels is not None:
+ loss += self.router_aux_loss_coef * aux_loss.to(loss.device) # make sure to reside in the same device
+
+ return MoeCausalLMOutputWithPast(
+ loss=loss,
+ aux_loss=aux_loss,
+ logits=logits,
+ past_key_values=outputs.past_key_values,
+ hidden_states=outputs.hidden_states,
+ attentions=outputs.attentions,
+ router_logits=outputs.router_logits,
+ )
+
+
+class MiniMaxM2ForSequenceClassification(GenericForSequenceClassification, MiniMaxM2PreTrainedModel):
+ pass
+
+
+class MiniMaxM2ForTokenClassification(GenericForTokenClassification, MiniMaxM2PreTrainedModel):
+ pass
+
+
+class MiniMaxM2ForQuestionAnswering(GenericForQuestionAnswering, MiniMaxM2PreTrainedModel):
+ pass
+
+
+__all__ = [
+ "MiniMaxM2ForCausalLM",
+ "MiniMaxM2ForQuestionAnswering",
+ "MiniMaxM2Model",
+ "MiniMaxM2PreTrainedModel",
+ "MiniMaxM2ForSequenceClassification",
+ "MiniMaxM2ForTokenClassification",
+]
diff --git a/subir_huggingface.py b/subir_huggingface.py
new file mode 100644
index 0000000000000000000000000000000000000000..d2acdf9f55e8ad8dee4908da7579e19241c312da
--- /dev/null
+++ b/subir_huggingface.py
@@ -0,0 +1,19 @@
+from huggingface_hub import HfApi
+
+api = HfApi()
+
+# O nome do seu repositΓ³rio no HF
+repo_id = "List-cloud/List-3.0-Ultra-Coder-Brain"
+
+print("Iniciando upload para o Hugging Face... Isso pode demorar bastante dependendo da internet.")
+
+# Faz o upload da pasta inteira, substituindo os arquivos antigos no HF
+api.upload_folder(
+ folder_path=r"K:\List-3.0-Ultra-Coder\List-3.0-Ultra-Coder-Brain",
+ repo_id=repo_id,
+ repo_type="model",
+ # Ignora os scripts de automaΓ§Γ£o que vocΓͺ nΓ£o quer subir
+ ignore_patterns=["*.pyc", "update_model_hashes.py", "boost_downloads.py", "upload_model.py"]
+)
+
+print("Upload concluΓdo com sucesso!")
diff --git a/tokenizer_config.json b/tokenizer_config.json
index ff8e2ebcbdb03324603c0a734e459ec9968096ae..4801e04325e0078db80978918a3f3a0ad8fc09f6 100644
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -1,495 +1,496 @@
-{
- "added_tokens_decoder": {
- "200000": {
- "content": "]!p~[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200001": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200002": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200003": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200004": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200005": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200006": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200007": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200008": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200009": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200010": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200011": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200012": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200013": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200014": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200015": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200016": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200017": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200018": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200019": {
- "content": "]~b]",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200020": {
- "content": "[e~[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200021": {
- "content": "]!d~[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200022": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200023": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200024": {
- "content": "]<]speech[>[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200025": {
- "content": "]<]image[>[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200026": {
- "content": "]<]video[>[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200027": {
- "content": "]<]start of speech[>[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200028": {
- "content": "]<]end of speech[>[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200029": {
- "content": "]<]start of image[>[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200030": {
- "content": "]<]end of image[>[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200031": {
- "content": "]<]start of video[>[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200032": {
- "content": "]<]end of video[>[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200033": {
- "content": "]<]vision pad[>[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200034": {
- "content": "]~!b[",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200035": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200036": {
- "content": "",
- "single_word": false,
- "lstrip": false,
- "rstrip": false,
- "normalized": false,
- "special": true
- },
- "200037": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200038": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200039": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200040": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200041": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200042": {
- "content": "",
- "lstrip": false,
- "normalized": false,
- "rstrip": false,
- "single_word": false,
- "special": true
- },
- "200043": {
- "content": "",
- "single_word": false,
- "lstrip": false,
- "rstrip": false,
- "normalized": false,
- "special": true
- },
- "200044": {
- "content": "",
- "single_word": false,
- "lstrip": false,
- "rstrip": false,
- "normalized": false,
- "special": true
- },
- "200045": {
- "content": "",
- "single_word": false,
- "lstrip": false,
- "rstrip": false,
- "normalized": false,
- "special": true
- },
- "200046": {
- "content": "",
- "single_word": false,
- "lstrip": false,
- "rstrip": false,
- "normalized": false,
- "special": true
- },
- "200047": {
- "content": "",
- "single_word": false,
- "lstrip": false,
- "rstrip": false,
- "normalized": false,
- "special": true
- },
- "200048": {
- "content": "",
- "single_word": false,
- "lstrip": false,
- "rstrip": false,
- "normalized": false,
- "special": true
- },
- "200049": {
- "content": "",
- "single_word": false,
- "lstrip": false,
- "rstrip": false,
- "normalized": false,
- "special": true
- },
- "200050": {
- "content": "",
- "single_word": false,
- "lstrip": false,
- "rstrip": false,
- "normalized": false,
- "special": false
- },
- "200051": {
- "content": "",
- "single_word": false,
- "lstrip": false,
- "rstrip": false,
- "normalized": false,
- "special": false
- },
- "200052": {
- "content": "",
- "single_word": false,
- "lstrip": false,
- "rstrip": false,
- "normalized": false,
- "special": false
- },
- "200053": {
- "content": "",
- "single_word": false,
- "lstrip": false,
- "rstrip": false,
- "normalized": false,
- "special": false
- }
- },
- "additional_special_tokens": [
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "]<]speech[>[",
- "]<]image[>[",
- "]<]video[>[",
- "]<]start of speech[>[",
- "]<]end of speech[>[",
- "]<]start of image[>[",
- "]<]end of image[>[",
- "]<]start of video[>[",
- "]<]end of video[>[",
- "]<]vision pad[>[",
- "]~!b[",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "[e~[",
- "]!d~[",
- "]!p~[",
- "]~b]",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- ""
- ],
- "add_prefix_space": false,
- "bos_token": "]~!b[",
- "clean_up_tokenization_spaces": false,
- "eos_token": "[e~[",
- "model_max_length": 40960000,
- "tokenizer_class": "GPT2Tokenizer",
- "unk_token": "]!d~["
-}
+{
+ "added_tokens_decoder": {
+ "200000": {
+ "content": "]!p~[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200001": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200002": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200003": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200004": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200005": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200006": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200007": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200008": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200009": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200010": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200011": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200012": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200013": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200014": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200015": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200016": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200017": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200018": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200019": {
+ "content": "]~b]",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200020": {
+ "content": "[e~[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200021": {
+ "content": "]!d~[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200022": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200023": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200024": {
+ "content": "]<]speech[>[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200025": {
+ "content": "]<]image[>[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200026": {
+ "content": "]<]video[>[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200027": {
+ "content": "]<]start of speech[>[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200028": {
+ "content": "]<]end of speech[>[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200029": {
+ "content": "]<]start of image[>[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200030": {
+ "content": "]<]end of image[>[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200031": {
+ "content": "]<]start of video[>[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200032": {
+ "content": "]<]end of video[>[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200033": {
+ "content": "]<]vision pad[>[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200034": {
+ "content": "]~!b[",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200035": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200036": {
+ "content": "",
+ "single_word": false,
+ "lstrip": false,
+ "rstrip": false,
+ "normalized": false,
+ "special": true
+ },
+ "200037": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200038": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200039": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200040": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200041": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200042": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "200043": {
+ "content": "",
+ "single_word": false,
+ "lstrip": false,
+ "rstrip": false,
+ "normalized": false,
+ "special": true
+ },
+ "200044": {
+ "content": "",
+ "single_word": false,
+ "lstrip": false,
+ "rstrip": false,
+ "normalized": false,
+ "special": true
+ },
+ "200045": {
+ "content": "",
+ "single_word": false,
+ "lstrip": false,
+ "rstrip": false,
+ "normalized": false,
+ "special": true
+ },
+ "200046": {
+ "content": "",
+ "single_word": false,
+ "lstrip": false,
+ "rstrip": false,
+ "normalized": false,
+ "special": true
+ },
+ "200047": {
+ "content": "",
+ "single_word": false,
+ "lstrip": false,
+ "rstrip": false,
+ "normalized": false,
+ "special": true
+ },
+ "200048": {
+ "content": "",
+ "single_word": false,
+ "lstrip": false,
+ "rstrip": false,
+ "normalized": false,
+ "special": true
+ },
+ "200049": {
+ "content": "",
+ "single_word": false,
+ "lstrip": false,
+ "rstrip": false,
+ "normalized": false,
+ "special": true
+ },
+ "200050": {
+ "content": "",
+ "single_word": false,
+ "lstrip": false,
+ "rstrip": false,
+ "normalized": false,
+ "special": false
+ },
+ "200051": {
+ "content": "",
+ "single_word": false,
+ "lstrip": false,
+ "rstrip": false,
+ "normalized": false,
+ "special": false
+ },
+ "200052": {
+ "content": "",
+ "single_word": false,
+ "lstrip": false,
+ "rstrip": false,
+ "normalized": false,
+ "special": false
+ },
+ "200053": {
+ "content": "",
+ "single_word": false,
+ "lstrip": false,
+ "rstrip": false,
+ "normalized": false,
+ "special": false
+ }
+ },
+ "additional_special_tokens": [
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "]<]speech[>[",
+ "]<]image[>[",
+ "]<]video[>[",
+ "]<]start of speech[>[",
+ "]<]end of speech[>[",
+ "]<]start of image[>[",
+ "]<]end of image[>[",
+ "]<]start of video[>[",
+ "]<]end of video[>[",
+ "]<]vision pad[>[",
+ "]~!b[",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "[e~[",
+ "]!d~[",
+ "]!p~[",
+ "]~b]",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ "",
+ ""
+ ],
+ "add_prefix_space": false,
+ "bos_token": "]~!b[",
+ "clean_up_tokenization_spaces": false,
+ "eos_token": "[e~[",
+ "model_max_length": 40960000,
+ "tokenizer_class": "GPT2Tokenizer",
+ "unk_token": "]!d~[",
+ "model_creator": "List Cloud"
+}
\ No newline at end of file