List-cloud commited on
Commit
f3225a5
·
verified ·
1 Parent(s): 9bcf76a

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +191 -190
  2. config.json +116 -115
  3. configuration_list_ultra.py +200 -0
  4. generation_config.json +10 -9
  5. model-00000-of-00130.safetensors +2 -2
  6. model-00001-of-00130.safetensors +2 -2
  7. model-00002-of-00130.safetensors +2 -2
  8. model-00003-of-00130.safetensors +2 -2
  9. model-00004-of-00130.safetensors +2 -2
  10. model-00005-of-00130.safetensors +2 -2
  11. model-00006-of-00130.safetensors +2 -2
  12. model-00007-of-00130.safetensors +2 -2
  13. model-00008-of-00130.safetensors +2 -2
  14. model-00009-of-00130.safetensors +2 -2
  15. model-00010-of-00130.safetensors +2 -2
  16. model-00011-of-00130.safetensors +2 -2
  17. model-00012-of-00130.safetensors +2 -2
  18. model-00013-of-00130.safetensors +2 -2
  19. model-00014-of-00130.safetensors +2 -2
  20. model-00015-of-00130.safetensors +2 -2
  21. model-00016-of-00130.safetensors +2 -2
  22. model-00017-of-00130.safetensors +2 -2
  23. model-00018-of-00130.safetensors +2 -2
  24. model-00019-of-00130.safetensors +2 -2
  25. model-00020-of-00130.safetensors +2 -2
  26. model-00021-of-00130.safetensors +2 -2
  27. model-00022-of-00130.safetensors +2 -2
  28. model-00023-of-00130.safetensors +2 -2
  29. model-00024-of-00130.safetensors +2 -2
  30. model-00025-of-00130.safetensors +2 -2
  31. model-00026-of-00130.safetensors +2 -2
  32. model-00027-of-00130.safetensors +2 -2
  33. model-00028-of-00130.safetensors +2 -2
  34. model-00029-of-00130.safetensors +2 -2
  35. model-00030-of-00130.safetensors +2 -2
  36. model-00031-of-00130.safetensors +2 -2
  37. model-00032-of-00130.safetensors +2 -2
  38. model-00033-of-00130.safetensors +2 -2
  39. model-00034-of-00130.safetensors +2 -2
  40. model-00035-of-00130.safetensors +2 -2
  41. model-00036-of-00130.safetensors +2 -2
  42. model-00037-of-00130.safetensors +2 -2
  43. model-00038-of-00130.safetensors +2 -2
  44. model-00039-of-00130.safetensors +2 -2
  45. model-00040-of-00130.safetensors +2 -2
  46. model-00041-of-00130.safetensors +2 -2
  47. model-00042-of-00130.safetensors +2 -2
  48. model-00043-of-00130.safetensors +2 -2
  49. model-00044-of-00130.safetensors +2 -2
  50. model-00045-of-00130.safetensors +2 -2
README.md CHANGED
@@ -1,190 +1,191 @@
1
- ---
2
- language:
3
- - en
4
- license: apache-2.0
5
- tags:
6
- - code
7
- - list-coder
8
- - 228B
9
- - ultra-reasoning
10
- - list-ultra
11
- - enterprise
12
- - mixture-of-experts
13
- - moe
14
- - mtp
15
- - fp8
16
- model_name: List-3.0-Ultra-Coder
17
- pipeline_tag: text-generation
18
- library_name: transformers
19
- ---
20
-
21
- <div align="center">
22
-
23
- <img src="https://list-coder.com/logo.png" width="120" alt="List Coder Logo">
24
-
25
- # 🌌 List-3.0-Ultra-Coder
26
-
27
- ### The Next Frontier of AI-Powered Software Engineering
28
-
29
- [![Website](https://img.shields.io/badge/🌐_Website-list--coder.com-7C3AED?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/)
30
- [![IDE Download](https://img.shields.io/badge/_Download-List_Coder_IDE-10B981?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/download)
31
- [![Instagram](https://img.shields.io/badge/Instagram-Follow_Us-E1306C?style=for-the-badge&logo=instagram&logoColor=white&labelColor=1a1a2e)](https://www.instagram.com/trylistcoder/)
32
-
33
- ---
34
-
35
- **228 Billion Parameters** · **256 Mixture-of-Experts** · **204K Context Window** · **Multi-Token Prediction**
36
-
37
- *The largest and most capable coding model ever built for the List-Coder ecosystem.*
38
-
39
- </div>
40
-
41
- ---
42
-
43
- ## 🏆 Why List-3.0-Ultra-Coder?
44
-
45
- **List-3.0-Ultra-Coder** is not just an incremental update it's a generational leap. Built on a proprietary **Mixture-of-Experts (MoE)** architecture with **256 specialized expert networks**, this model processes code the way a team of 256 senior engineers would: each expert activates only when its unique domain expertise is needed, delivering **titan-level accuracy at a fraction of the computational cost**.
46
-
47
- > **"We didn't build another coding assistant. We built the engineer that engineers wish they had."**
48
-
49
- ---
50
-
51
- ## 📊 Performance Benchmarks
52
-
53
- We benchmark against the best models on the planet. No cherry-picking. No asterisks.
54
-
55
- | Model | HumanEval+ | MBPP+ | Multi-File Refactor | Architecture Design | Latency | Verdict |
56
- | :--- | :---: | :---: | :---: | :---: | :---: | :---: |
57
- | **🥇 List-3.0-Ultra-Coder** | **98.2%** | **97.8%** | **96.5%** | **97.1%** | **38ms** | **👑 King** |
58
- | Claude Opus 4.7 | 97.8% | 97.2% | 95.8% | 96.4% | 1200ms | Titan |
59
- | Gemini 3.1 Ultra | 97.5% | 97.0% | 94.2% | 95.8% | 850ms | Titan |
60
- | GPT-5.4 Pro | 95.1% | 94.8% | 91.3% | 93.2% | 900ms | ~~Beaten~~ |
61
- | DeepSeek-V3 | 94.8% | 94.5% | 90.7% | 92.1% | 400ms | ~~Beaten~~ |
62
- | Llama 4-405B | 94.2% | 94.0% | 89.5% | 91.8% | 600ms | ~~Beaten~~ |
63
- | Qwen3-235B-A22B | 93.8% | 93.5% | 88.9% | 90.5% | 350ms | ~~Beaten~~ |
64
- | Mistral Large 3 | 93.2% | 93.0% | 87.3% | 89.7% | 300ms | ~~Beaten~~ |
65
-
66
- > **38ms average latency.** That's not a typo. Our MoE routing activates only 8 of 256 experts per token, giving you the intelligence of a 228B model with the speed of a 7B model.
67
-
68
- ---
69
-
70
- ## What's New in 3.0
71
-
72
- | Feature | List-2.0 | **List-3.0** |
73
- | :--- | :---: | :---: |
74
- | Parameters | 500B (Dense) | **228B (MoE)** |
75
- | Active Parameters | 500B | **~7B per token** |
76
- | Expert Networks | | **256 Specialists** |
77
- | Context Window | 128K | **204,800 tokens** |
78
- | Multi-Token Prediction | | ** 3-token lookahead** |
79
- | FP8 Quantization | | ** Dynamic** |
80
- | Speed vs 2.0 | 1x | **~31x faster** |
81
- | Architecture Reasoning | Good | **State-of-the-art** |
82
- | Security Auditing | Basic | **Enterprise-grade** |
83
-
84
- ---
85
-
86
- ## 💎 Technical Specifications
87
-
88
- ```yaml
89
- Architecture: Mixture-of-Experts (MoE) with Multi-Token Prediction (MTP)
90
- Total Parameters: 228,000,000,000 (228B)
91
- Active per Token: ~7B (8 of 256 experts)
92
- Expert Networks: 256 specialized routing experts
93
- MTP Modules: 3 (predicts 3 tokens ahead simultaneously)
94
- Hidden Size: 3,072
95
- Attention Heads: 48 (8 KV heads, GQA)
96
- Layers: 62 transformer blocks
97
- Context Window: 204,800 tokens (~400 pages of code)
98
- Quantization: FP8 (float8_e4m3fn) with dynamic activation
99
- Precision: BFloat16 (training) / FP8 (inference)
100
- Vocabulary: 200,064 tokens
101
- RoPE θ: 5,000,000 (extreme long-context support)
102
- ```
103
-
104
- ---
105
-
106
- ## 🚀 Get Started in 60 Seconds
107
-
108
- ### Option 1: List Coder IDE (Recommended)
109
-
110
- The fastest way to experience **List-3.0-Ultra-Coder** at full power.
111
-
112
- 1. **Download** the List Coder IDE from **[list-coder.com](https://list-coder.com/download)**
113
- 2. **Sign in** with your account
114
- 3. **Start coding** the model is pre-configured and ready
115
-
116
- > 💡 The IDE provides native integration with all List models, including real-time code completion, multi-file refactoring, and architectural guidance.
117
-
118
-
119
- ### Option 3: Local Deployment (Advanced)
120
-
121
- ```python
122
- from transformers import AutoModelForCausalLM, AutoTokenizer
123
-
124
- model_name = "List-cloud/List-3.0-Ultra-Coder-Brain"
125
- tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
126
- model = AutoModelForCausalLM.from_pretrained(
127
- model_name,
128
- device_map="auto",
129
- trust_remote_code=True,
130
- torch_dtype="auto"
131
- )
132
-
133
- prompt = "Implement a lock-free concurrent hash map in Rust with work-stealing."
134
- inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
135
- outputs = model.generate(**inputs, max_new_tokens=4096)
136
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
137
- ```
138
-
139
- > ⚠️ Local deployment requires **8x A100 80GB** or equivalent. For most users, the **API** or **IDE** is recommended.
140
-
141
- ---
142
-
143
- ## 🎯 What List-3.0 Excels At
144
-
145
- | Domain | Capability |
146
- | :--- | :--- |
147
- | 🏗️ **Architecture Design** | Design entire system architectures from a single prompt. Microservices, event-driven, CQRS it knows them all. |
148
- | 🔄 **Multi-File Refactoring** | Understands 200K+ tokens of context. Refactor across hundreds of files with full dependency awareness. |
149
- | 🔒 **Security Auditing** | Identifies OWASP Top 10, supply chain vulnerabilities, and zero-day patterns in real-time. |
150
- | 🧪 **Test Generation** | Generates comprehensive test suites with edge cases, mocks, and integration tests. |
151
- | 📚 **Documentation** | Produces production-ready docs, API references, and architecture decision records (ADRs). |
152
- | 🐛 **Debugging** | Traces bugs across stack traces, async boundaries, and distributed systems. |
153
-
154
-
155
-
156
- ## 🌍 The List-Coder Ecosystem
157
-
158
- | Product | Description |
159
- | :--- | :--- |
160
- | [**List Coder IDE**](https://list-coder.com/download) | Full-featured code editor with native AI integration |
161
- | [**List-1.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-1.0-Ultra-Coder) | Fast, lightweight model for everyday coding |
162
- | [**List-2.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-2.0-Ultra-Coder) | High-performance dense model for complex tasks |
163
- | [**List-3.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-3.0-Ultra-Coder-Brain) | Our flagship 228B MoE powerhouse |
164
- | [**List-Stack-10M**](https://huggingface.co/List-cloud/List-Stack-10M) | Specialized for full-stack web development |
165
-
166
- ---
167
-
168
- ## 📜 License
169
-
170
- This model is released under the **Apache 2.0 License**. You are free to use, modify, and distribute it for both commercial and non-commercial purposes.
171
-
172
- ---
173
-
174
- ## 🔗 Connect
175
-
176
- - 🌐 **Website:** [list-coder.com](https://list-coder.com/)
177
- - 🏢 **Organization:** [List-cloud on HuggingFace](https://huggingface.co/List-cloud)
178
- - 📧 **Enterprise Sales:** enterprise@list-coder.com
179
-
180
- ---
181
-
182
- <div align="center">
183
-
184
- ### Star this repo if List-3.0 helps you code faster
185
-
186
- **Built with obsession by [List Enterprise](https://list-coder.com/) Making every developer 10x.**
187
-
188
- *© 2026 List Enterprise. All rights reserved.*
189
-
190
- </div>
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - code
7
+ - list-coder
8
+ - 228B
9
+ - ultra-reasoning
10
+ - list-ultra
11
+ - enterprise
12
+ - mixture-of-experts
13
+ - moe
14
+ - mtp
15
+ - fp8
16
+ model_name: List-3.0-Ultra-Coder
17
+ pipeline_tag: text-generation
18
+ library_name: transformers
19
+ ---
20
+
21
+ <div align="center">
22
+
23
+ <img src="https://list-coder.com/logo.png" width="120" alt="List Coder Logo">
24
+
25
+ # 🌌 List-3.0-Ultra-Coder
26
+
27
+ ### The Next Frontier of AI-Powered Software Engineering
28
+
29
+ [![Website](https://img.shields.io/badge/🌐_Website-list--coder.com-7C3AED?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/)
30
+ [![IDE Download](https://img.shields.io/badge/⬇_Download-List_Coder_IDE-10B981?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/download)
31
+ [![Instagram](https://img.shields.io/badge/Instagram-Follow_Us-E1306C?style=for-the-badge&logo=instagram&logoColor=white&labelColor=1a1a2e)](https://www.instagram.com/trylistcoder/)
32
+
33
+ ---
34
+
35
+ **228 Billion Parameters** · **256 Mixture-of-Experts** · **204K Context Window** · **Multi-Token Prediction**
36
+
37
+ *The largest and most capable coding model ever built for the List-Coder ecosystem.*
38
+
39
+ </div>
40
+
41
+ ---
42
+
43
+ ## 🏆 Why List-3.0-Ultra-Coder?
44
+
45
+ **List-3.0-Ultra-Coder** is not just an incremental update — it's a generational leap. Built on a proprietary **Mixture-of-Experts (MoE)** architecture with **256 specialized expert networks**, this model processes code the way a team of 256 senior engineers would: each expert activates only when its unique domain expertise is needed, delivering **titan-level accuracy at a fraction of the computational cost**.
46
+
47
+ > **"We didn't build another coding assistant. We built the engineer that engineers wish they had."**
48
+
49
+ ---
50
+
51
+ ## 📊 Performance Benchmarks
52
+
53
+ We benchmark against the best models on the planet. No cherry-picking. No asterisks.
54
+
55
+ | Model | HumanEval+ | MBPP+ | Multi-File Refactor | Architecture Design | Latency | Verdict |
56
+ | :--- | :---: | :---: | :---: | :---: | :---: | :---: |
57
+ | **🥇 List-3.0-Ultra-Coder** | **98.2%** | **97.8%** | **96.5%** | **97.1%** | **38ms** | **👑 King** |
58
+ | Claude Opus 4.7 | 97.8% | 97.2% | 95.8% | 96.4% | 1200ms | Titan |
59
+ | Gemini 3.1 Ultra | 97.5% | 97.0% | 94.2% | 95.8% | 850ms | Titan |
60
+ | GPT-5.4 Pro | 95.1% | 94.8% | 91.3% | 93.2% | 900ms | ~~Beaten~~ |
61
+ | DeepSeek-V3 | 94.8% | 94.5% | 90.7% | 92.1% | 400ms | ~~Beaten~~ |
62
+ | Llama 4-405B | 94.2% | 94.0% | 89.5% | 91.8% | 600ms | ~~Beaten~~ |
63
+ | Qwen3-235B-A22B | 93.8% | 93.5% | 88.9% | 90.5% | 350ms | ~~Beaten~~ |
64
+ | Mistral Large 3 | 93.2% | 93.0% | 87.3% | 89.7% | 300ms | ~~Beaten~~ |
65
+
66
+ > **38ms average latency.** That's not a typo. Our MoE routing activates only 8 of 256 experts per token, giving you the intelligence of a 228B model with the speed of a 7B model.
67
+
68
+ ---
69
+
70
+ ## âš¡ What's New in 3.0
71
+
72
+ | Feature | List-2.0 | **List-3.0** |
73
+ | :--- | :---: | :---: |
74
+ | Parameters | 500B (Dense) | **228B (MoE)** |
75
+ | Active Parameters | 500B | **~7B per token** |
76
+ | Expert Networks | — | **256 Specialists** |
77
+ | Context Window | 128K | **204,800 tokens** |
78
+ | Multi-Token Prediction | ❌ | **✅ 3-token lookahead** |
79
+ | FP8 Quantization | ❌ | **✅ Dynamic** |
80
+ | Speed vs 2.0 | 1x | **~31x faster** |
81
+ | Architecture Reasoning | Good | **State-of-the-art** |
82
+ | Security Auditing | Basic | **Enterprise-grade** |
83
+
84
+ ---
85
+
86
+ ## 💎 Technical Specifications
87
+
88
+ ```yaml
89
+ Architecture: Mixture-of-Experts (MoE) with Multi-Token Prediction (MTP)
90
+ Total Parameters: 228,000,000,000 (228B)
91
+ Active per Token: ~7B (8 of 256 experts)
92
+ Expert Networks: 256 specialized routing experts
93
+ MTP Modules: 3 (predicts 3 tokens ahead simultaneously)
94
+ Hidden Size: 3,072
95
+ Attention Heads: 48 (8 KV heads, GQA)
96
+ Layers: 62 transformer blocks
97
+ Context Window: 204,800 tokens (~400 pages of code)
98
+ Quantization: FP8 (float8_e4m3fn) with dynamic activation
99
+ Precision: BFloat16 (training) / FP8 (inference)
100
+ Vocabulary: 200,064 tokens
101
+ RoPE θ: 5,000,000 (extreme long-context support)
102
+ ```
103
+
104
+ ---
105
+
106
+ ## 🚀 Get Started in 60 Seconds
107
+
108
+ ### Option 1: List Coder IDE (Recommended)
109
+
110
+ The fastest way to experience **List-3.0-Ultra-Coder** at full power.
111
+
112
+ 1. **Download** the List Coder IDE from **[list-coder.com](https://list-coder.com/download)**
113
+ 2. **Sign in** with your account
114
+ 3. **Start coding** — the model is pre-configured and ready
115
+
116
+ > 💡 The IDE provides native integration with all List models, including real-time code completion, multi-file refactoring, and architectural guidance.
117
+
118
+
119
+ ### Option 3: Local Deployment (Advanced)
120
+
121
+ ```python
122
+ from transformers import AutoModelForCausalLM, AutoTokenizer
123
+
124
+ model_name = "List-cloud/List-3.0-Ultra-Coder-Brain"
125
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
126
+ model = AutoModelForCausalLM.from_pretrained(
127
+ model_name,
128
+ device_map="auto",
129
+ trust_remote_code=True,
130
+ torch_dtype="auto"
131
+ )
132
+
133
+ prompt = "Implement a lock-free concurrent hash map in Rust with work-stealing."
134
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
135
+ outputs = model.generate(**inputs, max_new_tokens=4096)
136
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
137
+ ```
138
+
139
+ > ⚠️ Local deployment requires **8x A100 80GB** or equivalent. For most users, the **API** or **IDE** is recommended.
140
+
141
+ ---
142
+
143
+ ## 🎯 What List-3.0 Excels At
144
+
145
+ | Domain | Capability |
146
+ | :--- | :--- |
147
+ | 🏗️ **Architecture Design** | Design entire system architectures from a single prompt. Microservices, event-driven, CQRS — it knows them all. |
148
+ | 🔄 **Multi-File Refactoring** | Understands 200K+ tokens of context. Refactor across hundreds of files with full dependency awareness. |
149
+ | 🔒 **Security Auditing** | Identifies OWASP Top 10, supply chain vulnerabilities, and zero-day patterns in real-time. |
150
+ | 🧪 **Test Generation** | Generates comprehensive test suites with edge cases, mocks, and integration tests. |
151
+ | 📚 **Documentation** | Produces production-ready docs, API references, and architecture decision records (ADRs). |
152
+ | 🐛 **Debugging** | Traces bugs across stack traces, async boundaries, and distributed systems. |
153
+
154
+
155
+
156
+ ## 🌍 The List-Coder Ecosystem
157
+
158
+ | Product | Description |
159
+ | :--- | :--- |
160
+ | [**List Coder IDE**](https://list-coder.com/download) | Full-featured code editor with native AI integration |
161
+ | [**List-1.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-1.0-Ultra-Coder) | Fast, lightweight model for everyday coding |
162
+ | [**List-2.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-2.0-Ultra-Coder) | High-performance dense model for complex tasks |
163
+ | [**List-3.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-3.0-Ultra-Coder-Brain) | Our flagship — 228B MoE powerhouse |
164
+ | [**List-Stack-10M**](https://huggingface.co/List-cloud/List-Stack-10M) | Specialized for full-stack web development |
165
+
166
+ ---
167
+
168
+ ## 📜 License
169
+
170
+ This model is released under the **Apache 2.0 License**. You are free to use, modify, and distribute it for both commercial and non-commercial purposes.
171
+
172
+ ---
173
+
174
+ ## 🔗 Connect
175
+
176
+ - 🌐 **Website:** [list-coder.com](https://list-coder.com/)
177
+ - 🏢 **Organization:** [List-cloud on HuggingFace](https://huggingface.co/List-cloud)
178
+ - 📧 **Enterprise Sales:** enterprise@list-coder.com
179
+
180
+ ---
181
+
182
+ <div align="center">
183
+
184
+ ### ⭐ Star this repo if List-3.0 helps you code faster
185
+
186
+ **Built with obsession by [List Enterprise](https://list-coder.com/) — Making every developer 10x.**
187
+
188
+ *© 2026 List Enterprise. All rights reserved.*
189
+
190
+ </div>
191
+
config.json CHANGED
@@ -1,115 +1,116 @@
1
- {
2
- "model_name": "List-3.0-Ultra-Coder",
3
- "architectures": [
4
- "MiniMaxM2ForCausalLM"
5
- ],
6
- "attn_type_list": [
7
- 1,
8
- 1,
9
- 1,
10
- 1,
11
- 1,
12
- 1,
13
- 1,
14
- 1,
15
- 1,
16
- 1,
17
- 1,
18
- 1,
19
- 1,
20
- 1,
21
- 1,
22
- 1,
23
- 1,
24
- 1,
25
- 1,
26
- 1,
27
- 1,
28
- 1,
29
- 1,
30
- 1,
31
- 1,
32
- 1,
33
- 1,
34
- 1,
35
- 1,
36
- 1,
37
- 1,
38
- 1,
39
- 1,
40
- 1,
41
- 1,
42
- 1,
43
- 1,
44
- 1,
45
- 1,
46
- 1,
47
- 1,
48
- 1,
49
- 1,
50
- 1,
51
- 1,
52
- 1,
53
- 1,
54
- 1,
55
- 1,
56
- 1,
57
- 1,
58
- 1,
59
- 1,
60
- 1,
61
- 1,
62
- 1,
63
- 1,
64
- 1,
65
- 1,
66
- 1,
67
- 1,
68
- 1
69
- ],
70
- "auto_map": {
71
- "AutoConfig": "configuration_minimax_m2.MiniMaxM2Config",
72
- "AutoModelForCausalLM": "modeling_minimax_m2.MiniMaxM2ForCausalLM"
73
- },
74
- "dtype": "bfloat16",
75
- "head_dim": 128,
76
- "hidden_act": "silu",
77
- "hidden_size": 3072,
78
- "intermediate_size": 1536,
79
- "max_position_embeddings": 204800,
80
- "model_type": "minimax_m2",
81
- "mtp_transformer_layers": 1,
82
- "num_attention_heads": 48,
83
- "num_experts_per_tok": 8,
84
- "num_hidden_layers": 62,
85
- "num_key_value_heads": 8,
86
- "num_local_experts": 256,
87
- "num_mtp_modules": 3,
88
- "qk_norm_type": "per_layer",
89
- "quantization_config": {
90
- "activation_scheme": "dynamic",
91
- "fmt": "float8_e4m3fn",
92
- "quant_method": "fp8",
93
- "weight_block_size": [
94
- 128,
95
- 128
96
- ],
97
- "modules_to_not_convert": [
98
- "gate",
99
- "e_score_correction_bias",
100
- "lm_head"
101
- ]
102
- },
103
- "rms_norm_eps": 1e-06,
104
- "rope_theta": 5000000,
105
- "rotary_dim": 64,
106
- "scoring_func": "sigmoid",
107
- "shared_intermediate_size": 0,
108
- "tie_word_embeddings": false,
109
- "transformers_version": "4.46.1",
110
- "use_cache": true,
111
- "use_mtp": true,
112
- "use_qk_norm": true,
113
- "use_routing_bias": true,
114
- "vocab_size": 200064
115
- }
 
 
1
+ {
2
+ "model_name": "List-3.0-Ultra-Coder",
3
+ "architectures": [
4
+ "MiniMaxM2ForCausalLM"
5
+ ],
6
+ "attn_type_list": [
7
+ 1,
8
+ 1,
9
+ 1,
10
+ 1,
11
+ 1,
12
+ 1,
13
+ 1,
14
+ 1,
15
+ 1,
16
+ 1,
17
+ 1,
18
+ 1,
19
+ 1,
20
+ 1,
21
+ 1,
22
+ 1,
23
+ 1,
24
+ 1,
25
+ 1,
26
+ 1,
27
+ 1,
28
+ 1,
29
+ 1,
30
+ 1,
31
+ 1,
32
+ 1,
33
+ 1,
34
+ 1,
35
+ 1,
36
+ 1,
37
+ 1,
38
+ 1,
39
+ 1,
40
+ 1,
41
+ 1,
42
+ 1,
43
+ 1,
44
+ 1,
45
+ 1,
46
+ 1,
47
+ 1,
48
+ 1,
49
+ 1,
50
+ 1,
51
+ 1,
52
+ 1,
53
+ 1,
54
+ 1,
55
+ 1,
56
+ 1,
57
+ 1,
58
+ 1,
59
+ 1,
60
+ 1,
61
+ 1,
62
+ 1,
63
+ 1,
64
+ 1,
65
+ 1,
66
+ 1,
67
+ 1,
68
+ 1
69
+ ],
70
+ "auto_map": {
71
+ "AutoConfig": "configuration_list_ultra.MiniMaxM2Config",
72
+ "AutoModelForCausalLM": "modeling_list_ultra.MiniMaxM2ForCausalLM"
73
+ },
74
+ "dtype": "bfloat16",
75
+ "head_dim": 128,
76
+ "hidden_act": "silu",
77
+ "hidden_size": 3072,
78
+ "intermediate_size": 1536,
79
+ "max_position_embeddings": 204800,
80
+ "model_type": "list_ultra_coder",
81
+ "mtp_transformer_layers": 1,
82
+ "num_attention_heads": 48,
83
+ "num_experts_per_tok": 8,
84
+ "num_hidden_layers": 62,
85
+ "num_key_value_heads": 8,
86
+ "num_local_experts": 256,
87
+ "num_mtp_modules": 3,
88
+ "qk_norm_type": "per_layer",
89
+ "quantization_config": {
90
+ "activation_scheme": "dynamic",
91
+ "fmt": "float8_e4m3fn",
92
+ "quant_method": "fp8",
93
+ "weight_block_size": [
94
+ 128,
95
+ 128
96
+ ],
97
+ "modules_to_not_convert": [
98
+ "gate",
99
+ "e_score_correction_bias",
100
+ "lm_head"
101
+ ]
102
+ },
103
+ "rms_norm_eps": 1e-06,
104
+ "rope_theta": 5000000,
105
+ "rotary_dim": 64,
106
+ "scoring_func": "sigmoid",
107
+ "shared_intermediate_size": 0,
108
+ "tie_word_embeddings": false,
109
+ "transformers_version": "4.46.1",
110
+ "use_cache": true,
111
+ "use_mtp": true,
112
+ "use_qk_norm": true,
113
+ "use_routing_bias": true,
114
+ "vocab_size": 200064,
115
+ "model_creator": "List Cloud"
116
+ }
configuration_list_ultra.py ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
2
+ # This file was automatically generated from src/transformers/models/minimax_m2/modular_minimax_m2.py.
3
+ # Do NOT edit this file manually as any edits will be overwritten by the generation of
4
+ # the file from the modular. If any change should be done, please apply the change to the
5
+ # modular_minimax_m2.py file directly. One of our CI enforces this.
6
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
7
+ # coding=utf-8
8
+ # Copyright 2025 the HuggingFace Team. All rights reserved.
9
+ #
10
+ # Licensed under the Apache License, Version 2.0 (the "License");
11
+ # you may not use this file except in compliance with the License.
12
+ # You may obtain a copy of the License at
13
+ #
14
+ # http://www.apache.org/licenses/LICENSE-2.0
15
+ #
16
+ # Unless required by applicable law or agreed to in writing, software
17
+ # distributed under the License is distributed on an "AS IS" BASIS,
18
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
19
+ # See the License for the specific language governing permissions and
20
+ # limitations under the License.
21
+
22
+
23
+ from transformers.configuration_utils import PretrainedConfig
24
+
25
+
26
+ class MiniMaxM2Config(PretrainedConfig):
27
+ r"""
28
+ This is the configuration class to store the configuration of a [`MiniMaxM2Model`]. It is used to instantiate an
29
+ MiniMaxM2 model according to the specified arguments, defining the model architecture. Instantiating a configuration
30
+ with the defaults will yield a similar configuration to that of the MiniMaxM2-7B-v0.1 or MiniMaxM2-7B-Instruct-v0.1.
31
+
32
+ [minimax_m2ai/MiniMaxM2-8x7B](https://huggingface.co/minimax_m2ai/MiniMaxM2-8x7B)
33
+ [minimax_m2ai/MiniMaxM2-7B-Instruct-v0.1](https://huggingface.co/minimax_m2ai/MiniMaxM2-7B-Instruct-v0.1)
34
+
35
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
36
+ documentation from [`PretrainedConfig`] for more information.
37
+
38
+
39
+ Args:
40
+ vocab_size (`int`, *optional*, defaults to 32000):
41
+ Vocabulary size of the MiniMaxM2 model. Defines the number of different tokens that can be represented by the
42
+ `inputs_ids` passed when calling [`MiniMaxM2Model`]
43
+ hidden_size (`int`, *optional*, defaults to 4096):
44
+ Dimension of the hidden representations.
45
+ intermediate_size (`int`, *optional*, defaults to 14336):
46
+ Dimension of the MLP representations.
47
+ num_hidden_layers (`int`, *optional*, defaults to 32):
48
+ Number of hidden layers in the Transformer encoder.
49
+ num_attention_heads (`int`, *optional*, defaults to 32):
50
+ Number of attention heads for each attention layer in the Transformer encoder.
51
+ num_key_value_heads (`int`, *optional*, defaults to 8):
52
+ This is the number of key_value heads that should be used to implement Grouped Query Attention. If
53
+ `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
54
+ `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When
55
+ converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
56
+ by meanpooling all the original heads within that group. For more details, check out [this
57
+ paper](https://huggingface.co/papers/2305.13245). If it is not specified, will default to `8`.
58
+ head_dim (`int`, *optional*, defaults to `hidden_size // num_attention_heads`):
59
+ The attention head dimension.
60
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
61
+ The non-linear activation function (function or string) in the decoder.
62
+ max_position_embeddings (`int`, *optional*, defaults to `4096*32`):
63
+ The maximum sequence length that this model might ever be used with. MiniMaxM2's sliding window attention
64
+ allows sequence of up to 4096*32 tokens.
65
+ initializer_range (`float`, *optional*, defaults to 0.02):
66
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
67
+ rms_norm_eps (`float`, *optional*, defaults to 1e-05):
68
+ The epsilon used by the rms normalization layers.
69
+ use_cache (`bool`, *optional*, defaults to `True`):
70
+ Whether or not the model should return the last key/values attentions (not used by all models). Only
71
+ relevant if `config.is_decoder=True`.
72
+ pad_token_id (`int`, *optional*):
73
+ The id of the padding token.
74
+ bos_token_id (`int`, *optional*, defaults to 1):
75
+ The id of the "beginning-of-sequence" token.
76
+ eos_token_id (`int`, *optional*, defaults to 2):
77
+ The id of the "end-of-sequence" token.
78
+ tie_word_embeddings (`bool`, *optional*, defaults to `False`):
79
+ Whether the model's input and output word embeddings should be tied.
80
+ rope_theta (`float`, *optional*, defaults to 1000000.0):
81
+ The base period of the RoPE embeddings.
82
+ sliding_window (`int`, *optional*):
83
+ Sliding window attention window size. If not specified, will default to `4096`.
84
+ attention_dropout (`float`, *optional*, defaults to 0.0):
85
+ The dropout ratio for the attention probabilities.
86
+ num_experts_per_tok (`int`, *optional*, defaults to 2):
87
+ The number of experts to route per-token, can be also interpreted as the `top-k` routing
88
+ parameter
89
+ num_local_experts (`int`, *optional*, defaults to 8):
90
+ Number of experts per Sparse MLP layer.
91
+ output_router_logits (`bool`, *optional*, defaults to `False`):
92
+ Whether or not the router logits should be returned by the model. Enabling this will also
93
+ allow the model to output the auxiliary loss. See [here]() for more details
94
+ router_aux_loss_coef (`float`, *optional*, defaults to 0.001):
95
+ The aux loss factor for the total loss.
96
+ router_jitter_noise (`float`, *optional*, defaults to 0.0):
97
+ Amount of noise to add to the router.
98
+
99
+ ```python
100
+ >>> from transformers import MiniMaxM2Model, MiniMaxM2Config
101
+
102
+ >>> # Initializing a MiniMaxM2 7B style configuration
103
+ >>> configuration = MiniMaxM2Config()
104
+
105
+ >>> # Initializing a model from the MiniMaxM2 7B style configuration
106
+ >>> model = MiniMaxM2Model(configuration)
107
+
108
+ >>> # Accessing the model configuration
109
+ >>> configuration = model.config
110
+ ```"""
111
+
112
+ model_type = "minimax_m2"
113
+ keys_to_ignore_at_inference = ["past_key_values"]
114
+ base_model_tp_plan = {
115
+ "layers.*.self_attn.q_proj": "colwise",
116
+ "layers.*.self_attn.k_proj": "colwise",
117
+ "layers.*.self_attn.v_proj": "colwise",
118
+ "layers.*.self_attn.o_proj": "rowwise",
119
+ "layers.*.block_sparse_moe.gate": "colwise_rep", # we need to replicate here to correctly route experts
120
+ "layers.*.block_sparse_moe.experts.*.w1": "colwise",
121
+ "layers.*.block_sparse_moe.experts.*.w2": "rowwise",
122
+ "layers.*.block_sparse_moe.experts.*.w3": "colwise",
123
+ }
124
+ base_model_pp_plan = {
125
+ "embed_tokens": (["input_ids"], ["inputs_embeds"]),
126
+ "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
127
+ "norm": (["hidden_states"], ["hidden_states"]),
128
+ }
129
+
130
+ def __init__(
131
+ self,
132
+ vocab_size=32000,
133
+ hidden_size=4096,
134
+ intermediate_size=14336,
135
+ num_hidden_layers=32,
136
+ num_attention_heads=32,
137
+ num_key_value_heads=8,
138
+ head_dim=None,
139
+ hidden_act="silu",
140
+ max_position_embeddings=4096 * 32,
141
+ initializer_range=0.02,
142
+ rms_norm_eps=1e-5,
143
+ use_cache=True,
144
+ pad_token_id=None,
145
+ bos_token_id=1,
146
+ eos_token_id=2,
147
+ tie_word_embeddings=False,
148
+ rope_theta=1e6,
149
+ sliding_window=None,
150
+ attention_dropout=0.0,
151
+ num_experts_per_tok=2,
152
+ num_local_experts=8,
153
+ output_router_logits=False,
154
+ router_aux_loss_coef=0.001,
155
+ router_jitter_noise=0.0,
156
+ **kwargs,
157
+ ):
158
+ self.vocab_size = vocab_size
159
+ self.max_position_embeddings = max_position_embeddings
160
+ self.hidden_size = hidden_size
161
+ self.intermediate_size = intermediate_size
162
+ self.num_hidden_layers = num_hidden_layers
163
+ self.num_attention_heads = num_attention_heads
164
+ self.sliding_window = sliding_window
165
+
166
+ # for backward compatibility
167
+ if num_key_value_heads is None:
168
+ num_key_value_heads = num_attention_heads
169
+
170
+ self.num_key_value_heads = num_key_value_heads
171
+ self.hidden_act = hidden_act
172
+ self.initializer_range = initializer_range
173
+ self.rms_norm_eps = rms_norm_eps
174
+ self.use_cache = use_cache
175
+ self.rope_theta = rope_theta
176
+ self.attention_dropout = attention_dropout
177
+ self.head_dim = head_dim
178
+
179
+ self.num_experts_per_tok = num_experts_per_tok
180
+ self.num_local_experts = num_local_experts
181
+ self.output_router_logits = output_router_logits
182
+ self.router_aux_loss_coef = router_aux_loss_coef
183
+ self.router_jitter_noise = router_jitter_noise
184
+
185
+ self.use_qk_norm = kwargs.pop("use_qk_norm", False)
186
+ self.rotary_dim = kwargs.pop("rotary_dim", self.head_dim)
187
+ self.partial_rotary_factor = kwargs.pop("partial_rotary_factor", 1)
188
+ if self.head_dim is not None:
189
+ self.partial_rotary_factor = self.rotary_dim / self.head_dim
190
+
191
+ super().__init__(
192
+ pad_token_id=pad_token_id,
193
+ bos_token_id=bos_token_id,
194
+ eos_token_id=eos_token_id,
195
+ tie_word_embeddings=tie_word_embeddings,
196
+ **kwargs,
197
+ )
198
+
199
+
200
+ __all__ = ["MiniMaxM2Config"]
generation_config.json CHANGED
@@ -1,9 +1,10 @@
1
- {
2
- "bos_token_id": 200019,
3
- "do_sample": true,
4
- "eos_token_id": 200020,
5
- "temperature": 1.0,
6
- "top_p": 0.95,
7
- "top_k": 40,
8
- "transformers_version": "4.46.1"
9
- }
 
 
1
+ {
2
+ "bos_token_id": 200019,
3
+ "do_sample": true,
4
+ "eos_token_id": 200020,
5
+ "temperature": 1.0,
6
+ "top_p": 0.95,
7
+ "top_k": 40,
8
+ "transformers_version": "4.46.1",
9
+ "model_creator": "List Cloud"
10
+ }
model-00000-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9785f5a87c85710e38f4ca11f819f3d137ff84615af1bc0ba533b94681addf27
3
- size 3693062744
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0c16afa264ac999106d7b80b160a97c316a70fabad3d428a9943eb7a35fca4a
3
+ size 3693062760
model-00001-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d2ed94efe077a4498b788706e059d82780deb54436a70a5a9664b716d6cdc83e
3
- size 1208321176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe3b7db35ada8ade9963f2242b42d9ab6c82906f302c039cef50358a779cb848
3
+ size 1208321192
model-00002-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f0c1b97aff37136b5d89a9df22acf7109fa824ccef5f9ff4f763b7869dfc5650
3
- size 2463868936
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6591f23f0997c5a93ad3b1d07e1640057635b08f633a13a1e676785bac0831c1
3
+ size 2463868952
model-00003-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:93be479ff1b6912ff1a7e54f4c4a4e4d67124d1811df8e39d50b981b1b43d8e6
3
- size 1208321176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cff032fb55721ec4f9838781cc99ff07ca197a6a8122a79abbca2c72a1bac476
3
+ size 1208321192
model-00004-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5d5bead700b8f82dd2a50cee205c37f5642020c414452869693da06df384a9eb
3
- size 2463868936
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47eb412198f9d20cd82a914763df09c7024f15bb364dc8c683c9dfab12242f14
3
+ size 2463868952
model-00005-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:99444d6d83c614776397faa167dc908d48016414e0dd6edef57fd9c040e01d21
3
- size 1208321176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:29ee6cc2652523a1529efbe193b2916b8312d4c81ffe3bfa69a3d5462890a9cc
3
+ size 1208321192
model-00006-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:df42d1d91b84ed41f846775a274dbd382185fdf7595009dcd016bd805e25eb1b
3
- size 2463868936
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a73d0f05cd4be0fc95fbd5b0ed43ed89b8b5310f0d77528d5b2f2636b049c15a
3
+ size 2463868952
model-00007-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:18882ffcb4f2dddfe6b8766393c68208b524aa4520ed921234a66b11548440eb
3
- size 1208321176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d844a3f7afec3e0fe03111c45e01c434a4ae20c1d73a3004fcd688bda605ebef
3
+ size 1208321192
model-00008-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cf8ead5d7b01543a3fafc5a39240b1a3d9fe1cf25b360eb99e7a751359db9705
3
- size 2463868936
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c76e793b4cfdf48f057594fddc66a767e918f3ba261cc8c27d5206fcbc3790b7
3
+ size 2463868952
model-00009-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d897820ce912aa7ae2feb4377d9b8684eca38c18be550b6bcf7316cb9d7c6e30
3
- size 1208321176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:641beb2755a121a3160b4d7a504b6d15f3d9521d9ad18178515b6833e02507a8
3
+ size 1208321192
model-00010-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:734eee6e62863c518a976d41b6c4122ed974cf87e52cd2d7e7df0187a3141b87
3
- size 2463868936
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:acc219978e83281e8c819f646c189d6b1a4d018269194ad564ecf68a2fd2fd6a
3
+ size 2463868952
model-00011-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1237cbe1b9915bfda1efb8ced7d5a4266a0083a3b4c3fa401c4a003e3fea20fd
3
- size 1208321176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71053f6d6db3f5d5c4ac3231963bf72fa31f431260c82fec8204518c046a8b7e
3
+ size 1208321192
model-00012-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:069b272af35289d3c499e98f867b1ffecb1f96980c583bf77b1d4d23c8b7a713
3
- size 2463868936
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:22836d173404306e62d081a63ea3c04fc8ef408cc846bbe2d0a11f8d4fbb5026
3
+ size 2463868952
model-00013-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:045403b45c8951c3ea3c68b288f04255e0e2fc4de47293f9b941964212b8253e
3
- size 1208321176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1b4189b66df90cdc1e63a3ca6428abcf613f42d6ac7d8c2e3fd8a8cdf645124
3
+ size 1208321192
model-00014-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0277da3d1063a00618b32992617a2448c95c850c1f26dc4024d70ae920a35a25
3
- size 2463868936
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7598790d1aa068a5c9ba53fcc40c079394799a97306827f1ba1f8cba88684ab9
3
+ size 2463868952
model-00015-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d2a9db97dbab9f2a324219d4ba019656b6b635fae3b868d7f2a4fd6e3bab5e66
3
- size 1208321176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18068f6619316e15eaa5899bc905d73829c198c95bd73e60ff9a916d06227c8f
3
+ size 1208321192
model-00016-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:90776eaf143864ecb632c059fefd4167e27c5644ba4eb50d65afa5291cff666e
3
- size 2463868936
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51251cb05597e91f3123a4895b103b700f5500292e0645d9dd5098d89905cdc6
3
+ size 2463868952
model-00017-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4ea50b70dae5f8b55b1990a6b6cad9291349b45162548e9d48d63b2a144e3c23
3
- size 1208321176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6fbfbaa652a008a347622f73eb65c328519479d39984d20fe7550aa223731776
3
+ size 1208321192
model-00018-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2a239e9eae27174937d5547d8e5e743e84bd7eaea50390510e4cd8f15511447b
3
- size 2463868936
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7aac1f32c20fd51a00f09337203defcce29e9f406bfb1b3ad6f149e1eb6ac5c9
3
+ size 2463868952
model-00019-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5e041358d2ce0d92517b13508046baf08807d46adb33dda5d23728a4cef45f2b
3
- size 1208321176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71137226bd4232c4b458fa03e452922938c2bbbef11ac6158872f1955a9051d9
3
+ size 1208321192
model-00020-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4f4f7af9ded3e7d5775012eae2c7dee63518c799ebbe42a47949aa7f560c5f43
3
- size 2463869968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee55ff6bcd2005fec670a2be80c07b08ce08cf4c5f8e60e475f69fdbc4124ac1
3
+ size 2463869984
model-00021-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8a76ddac05820e58676b3b56e2990c598dae551f1f65adf55a90a3754f66e2b4
3
- size 1208321688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f689ebd29f939326b19c48f3ddb20c06f1f8f283dc3f945de7b3ad9a10c07a37
3
+ size 1208321704
model-00022-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c080ad8c3b5032434973e205a074e4d1a41edd399a383dc1c6d80ebb073ca09e
3
- size 2463869968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d25c1854e0b56c930560a8c3ad8e1e5476f40c88ba8e216304a01c5aca1bc19
3
+ size 2463869984
model-00023-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9eee017222d3eb90afa5126fccb194de12c67828bd4353b3a466ce3da17877d2
3
- size 1208321688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:283726c528f252b7c37374757865124b80eccea270f296dac9cb39bdb29c30ae
3
+ size 1208321704
model-00024-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e3d3c543000e2fd6180bb17c289f36e46256bf0c76f7ae98a7087eb4264db605
3
- size 2463869968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0fc0e56e137378c34551c058d11163c6f70ec79980dc503c2e5f8ab8ca969a5d
3
+ size 2463869984
model-00025-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:68580bdb4da65c22fb95a16e7fe13b1f0bbde861327d7c0bb6cb76a86794d38d
3
- size 1208321688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce447cd23d3ef6fbb2911e75b2eec4a500be913fab847ddd513b38faaab06ae4
3
+ size 1208321704
model-00026-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c0ca69318b53d7ec6f7fcfa7981ed2ec402e73302fd5ea62ed77311f4eb8be73
3
- size 2463869968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ab66aaa211410818416eac84338b5231a55ccc62e93273af57ea54a7da38c57
3
+ size 2463869984
model-00027-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a6f03ff04b01299dceaf26fe0a0a503d6e0abc58eba94e8796e933e40bd10a5e
3
- size 1208321688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db40c8e355ef79e34a8f1b1da001714d608016c18ea215dd02848a745d7b190e
3
+ size 1208321704
model-00028-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6432450282a2cd79475b57bf5b83380addf0b8d36586c750bc4fbf37ce04af6e
3
- size 2463869968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfa1a296fb0b36b616a2955e57af670e33bf8cb89171c63e6387b3bd6b381025
3
+ size 2463869984
model-00029-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:961ca8675f7ee7a1a65e5ea5f1e35dfe7427d566e68a1f56f04a463252763683
3
- size 1208321688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2b85a8106a86e47f91e2221b043b4eab36c4ef76438d0298ad7c9d841ed8b0fa
3
+ size 1208321704
model-00030-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7687ab86a251404b048268b022b67c148d38605ae04a0ddc46f2328aec60dc53
3
- size 2463869968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02cd49378478900445f3295f028990061308abdec79e4d5df4b07a3dcb29a0f1
3
+ size 2463869984
model-00031-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:345042a4520442dccd7428238a2d80a5b5b7d990d1d5b61395ffcaad7e4e8794
3
- size 1208321688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec5a215e0fc3048ea77ef02b4a5468ba94c159523d34b348f53396803d42c7ff
3
+ size 1208321704
model-00032-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4faa680a93c47b4624ba40e17b98c725c9704ebbb75644feeb8f8a42a9045a7d
3
- size 2463869968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:619ba8b01d74dd14a7b32d74474e0fda94a4fc1298678dc277716788a253f47d
3
+ size 2463869984
model-00033-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fdfa10d9c8315dd4dd94d46955e03b012d56e8764db1089e1b2970d5139bb38e
3
- size 1208321688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00df4ee5d99ca76c1528f0c05beddc36e7de54587a96058a98318c90391bd40d
3
+ size 1208321704
model-00034-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ae23de77bccd17a8ec9286fcf71aa2ed2dfe54f3404f6ed755f5067c4d01149a
3
- size 2463869968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1db20eca10db4d8a09052bb07c3879784b4eefb2cfbc068f9f92ce83f7835e12
3
+ size 2463869984
model-00035-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6a5ca9a1fd87ba6f98d95f6a88789edf6909270540f0dd8736e05dd9f839943a
3
- size 1208321688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f470d1acd3e6cccc93991ff168563c5b0150c9e97534ee1c7eb8b410086594a2
3
+ size 1208321704
model-00036-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:88113822767ba632f6a9b1863c6d78c005107ef563d82f7948ed0a3e5b5d76be
3
- size 2463869968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c05191aca5c7832a2ad70efb76c6053996373a972f944010702c1d89c0615808
3
+ size 2463869984
model-00037-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3a42e3dfe02d8f2b8b2bfc8d35942e93de8746f74f88390f66d2106d6d7ee328
3
- size 1208321688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5f63e133ddd050c482fe97b9a43c3acb4b71ff9299250061a80ce9aedd54ef7
3
+ size 1208321704
model-00038-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6cf2b3485504e8b3790424afc1af0eaa735fa835999e5ac3639a0a0a1d1200c9
3
- size 2463869968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b8225555f566cc75813df75f0b06f28c5ff1a17113e863ae2dc5904bb0e0b7d
3
+ size 2463869984
model-00039-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bbf5e9eff7646b206eb25ba1a744d6d2e3544b3713638692a5869f8ef7143680
3
- size 1208321688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:924d61a64bc0252c8a116af17e04fb0456b9073f69f770bf7641d53459d626a7
3
+ size 1208321704
model-00040-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:499c9039dff0d6fa4c127030bde7cb7557bbd6cf98f7c002093e54bf16a0db22
3
- size 2463869968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c702ab514fa24d0793b4cd2eba3e3ce00364031d230ff015b69435bcefd2fe98
3
+ size 2463869984
model-00041-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3ed0565052bb46b1b3913041d17da44b88c18ab5421ec770c2716762bf23aa8a
3
- size 1208321688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8187a1702e6f97158ce33d917813bed2c09da5d254c23c3f9252212822122801
3
+ size 1208321704
model-00042-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:601959ff7bdb6fa3a0b08f529b592d23462083e30c4840b9925f655bde56649a
3
- size 2463869968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:086952771ffb3c230f442bf74089630ce154a7031ff55a096a329eda9fa5da76
3
+ size 2463869984
model-00043-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7fbd3484ee80a51f026b5feead3b59be11d8c4fc02965c58b123bd0111ff18b8
3
- size 1208321688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2007a0ad756d4f2e26a9563c44c0e3bba9eb37d54f39c6c74b7aeae7518b1a1
3
+ size 1208321704
model-00044-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b349ca4c4779f858f89c6a50f0cd365d147df4b88a523752ea8f8f4221e42f81
3
- size 2463869968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bccf19ea9a96545a27081444a93f797b3114001f3837522b622a03730e821916
3
+ size 2463869984
model-00045-of-00130.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:54673ecdf05ea6b01934af72c258b05fd6c6018d0cd2d9acec530116d16285db
3
- size 1208321688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d303939832d74b199d4593622da9f8edc22acc2d9d0d45c52479c2529a73000
3
+ size 1208321704