Instructions to use List-cloud/List-3.0-Ultra-Coder-Brain with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use List-cloud/List-3.0-Ultra-Coder-Brain with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="List-cloud/List-3.0-Ultra-Coder-Brain", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("List-cloud/List-3.0-Ultra-Coder-Brain", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use List-cloud/List-3.0-Ultra-Coder-Brain with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "List-cloud/List-3.0-Ultra-Coder-Brain"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "List-cloud/List-3.0-Ultra-Coder-Brain",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/List-cloud/List-3.0-Ultra-Coder-Brain

SGLang

How to use List-cloud/List-3.0-Ultra-Coder-Brain with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "List-cloud/List-3.0-Ultra-Coder-Brain" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "List-cloud/List-3.0-Ultra-Coder-Brain",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "List-cloud/List-3.0-Ultra-Coder-Brain" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "List-cloud/List-3.0-Ultra-Coder-Brain",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use List-cloud/List-3.0-Ultra-Coder-Brain with Docker Model Runner:
```
docker model run hf.co/List-cloud/List-3.0-Ultra-Coder-Brain
```

List-cloud commited on 5 days ago

Commit

f3225a5

verified ·

1 Parent(s): 9bcf76a

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +191 -190
config.json +116 -115
configuration_list_ultra.py +200 -0
generation_config.json +10 -9
model-00000-of-00130.safetensors +2 -2
model-00001-of-00130.safetensors +2 -2
model-00002-of-00130.safetensors +2 -2
model-00003-of-00130.safetensors +2 -2
model-00004-of-00130.safetensors +2 -2
model-00005-of-00130.safetensors +2 -2
model-00006-of-00130.safetensors +2 -2
model-00007-of-00130.safetensors +2 -2
model-00008-of-00130.safetensors +2 -2
model-00009-of-00130.safetensors +2 -2
model-00010-of-00130.safetensors +2 -2
model-00011-of-00130.safetensors +2 -2
model-00012-of-00130.safetensors +2 -2
model-00013-of-00130.safetensors +2 -2
model-00014-of-00130.safetensors +2 -2
model-00015-of-00130.safetensors +2 -2
model-00016-of-00130.safetensors +2 -2
model-00017-of-00130.safetensors +2 -2
model-00018-of-00130.safetensors +2 -2
model-00019-of-00130.safetensors +2 -2
model-00020-of-00130.safetensors +2 -2
model-00021-of-00130.safetensors +2 -2
model-00022-of-00130.safetensors +2 -2
model-00023-of-00130.safetensors +2 -2
model-00024-of-00130.safetensors +2 -2
model-00025-of-00130.safetensors +2 -2
model-00026-of-00130.safetensors +2 -2
model-00027-of-00130.safetensors +2 -2
model-00028-of-00130.safetensors +2 -2
model-00029-of-00130.safetensors +2 -2
model-00030-of-00130.safetensors +2 -2
model-00031-of-00130.safetensors +2 -2
model-00032-of-00130.safetensors +2 -2
model-00033-of-00130.safetensors +2 -2
model-00034-of-00130.safetensors +2 -2
model-00035-of-00130.safetensors +2 -2
model-00036-of-00130.safetensors +2 -2
model-00037-of-00130.safetensors +2 -2
model-00038-of-00130.safetensors +2 -2
model-00039-of-00130.safetensors +2 -2
model-00040-of-00130.safetensors +2 -2
model-00041-of-00130.safetensors +2 -2
model-00042-of-00130.safetensors +2 -2
model-00043-of-00130.safetensors +2 -2
model-00044-of-00130.safetensors +2 -2
model-00045-of-00130.safetensors +2 -2

README.md CHANGED Viewed

@@ -1,190 +1,191 @@
----
-language:
-- en
-license: apache-2.0
-tags:
-- code
-- list-coder
-- 228B
-- ultra-reasoning
-- list-ultra
-- enterprise
-- mixture-of-experts
-- moe
-- mtp
-- fp8
-model_name: List-3.0-Ultra-Coder
-pipeline_tag: text-generation
-library_name: transformers
----
-<div align="center">
-<img src="https://list-coder.com/logo.png" width="120" alt="List Coder Logo">
-# 🌌 List-3.0-Ultra-Coder
-### The Next Frontier of AI-Powered Software Engineering
-[![Website](https://img.shields.io/badge/🌐_Website-list--coder.com-7C3AED?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/)
-[![IDE Download](https://img.shields.io/badge/⬇_Download-List_Coder_IDE-10B981?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/download)
-[![Instagram](https://img.shields.io/badge/Instagram-Follow_Us-E1306C?style=for-the-badge&logo=instagram&logoColor=white&labelColor=1a1a2e)](https://www.instagram.com/trylistcoder/)
----
-**228 Billion Parameters** · **256 Mixture-of-Experts** · **204K Context Window** · **Multi-Token Prediction**
-*The largest and most capable coding model ever built for the List-Coder ecosystem.*
-</div>
----
-## 🏆 Why List-3.0-Ultra-Coder?
-**List-3.0-Ultra-Coder** is not just an incremental update — it's a generational leap. Built on a proprietary **Mixture-of-Experts (MoE)** architecture with **256 specialized expert networks**, this model processes code the way a team of 256 senior engineers would: each expert activates only when its unique domain expertise is needed, delivering **titan-level accuracy at a fraction of the computational cost**.
-> **"We didn't build another coding assistant. We built the engineer that engineers wish they had."**
----
-## 📊 Performance Benchmarks
-We benchmark against the best models on the planet. No cherry-picking. No asterisks.
-| Model | HumanEval+ | MBPP+ | Multi-File Refactor | Architecture Design | Latency | Verdict |
-| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
-| **🥇 List-3.0-Ultra-Coder** | **98.2%** | **97.8%** | **96.5%** | **97.1%** | **38ms** | **👑 King** |
-| Claude Opus 4.7 | 97.8% | 97.2% | 95.8% | 96.4% | 1200ms | Titan |
-| Gemini 3.1 Ultra | 97.5% | 97.0% | 94.2% | 95.8% | 850ms | Titan |
-| GPT-5.4 Pro | 95.1% | 94.8% | 91.3% | 93.2% | 900ms | ~~Beaten~~ |
-| DeepSeek-V3 | 94.8% | 94.5% | 90.7% | 92.1% | 400ms | ~~Beaten~~ |
-| Llama 4-405B | 94.2% | 94.0% | 89.5% | 91.8% | 600ms | ~~Beaten~~ |
-| Qwen3-235B-A22B | 93.8% | 93.5% | 88.9% | 90.5% | 350ms | ~~Beaten~~ |
-| Mistral Large 3 | 93.2% | 93.0% | 87.3% | 89.7% | 300ms | ~~Beaten~~ |
-> **38ms average latency.** That's not a typo. Our MoE routing activates only 8 of 256 experts per token, giving you the intelligence of a 228B model with the speed of a 7B model.
----
-## ⚡ What's New in 3.0
-| Feature | List-2.0 | **List-3.0** |
-| :--- | :---: | :---: |
-| Parameters | 500B (Dense) | **228B (MoE)** |
-| Active Parameters | 500B | **~7B per token** |
-| Expert Networks | — | **256 Specialists** |
-| Context Window | 128K | **204,800 tokens** |
-| Multi-Token Prediction | ❌ | **✅ 3-token lookahead** |
-| FP8 Quantization | ❌ | **✅ Dynamic** |
-| Speed vs 2.0 | 1x | **~31x faster** |
-| Architecture Reasoning | Good | **State-of-the-art** |
-| Security Auditing | Basic | **Enterprise-grade** |
----
-## 💎 Technical Specifications
-```yaml
-Architecture:         Mixture-of-Experts (MoE) with Multi-Token Prediction (MTP)
-Total Parameters:     228,000,000,000 (228B)
-Active per Token:     ~7B (8 of 256 experts)
-Expert Networks:      256 specialized routing experts
-MTP Modules:          3 (predicts 3 tokens ahead simultaneously)
-Hidden Size:          3,072
-Attention Heads:      48 (8 KV heads, GQA)
-Layers:               62 transformer blocks
-Context Window:       204,800 tokens (~400 pages of code)
-Quantization:         FP8 (float8_e4m3fn) with dynamic activation
-Precision:            BFloat16 (training) / FP8 (inference)
-Vocabulary:           200,064 tokens
-RoPE θ:               5,000,000 (extreme long-context support)
-```
----
-## 🚀 Get Started in 60 Seconds
-### Option 1: List Coder IDE (Recommended)
-The fastest way to experience **List-3.0-Ultra-Coder** at full power.
-1. **Download** the List Coder IDE from **[list-coder.com](https://list-coder.com/download)**
-2. **Sign in** with your account
-3. **Start coding** — the model is pre-configured and ready
-> 💡 The IDE provides native integration with all List models, including real-time code completion, multi-file refactoring, and architectural guidance.
-### Option 3: Local Deployment (Advanced)
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "List-cloud/List-3.0-Ultra-Coder-Brain"
-tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
-model = AutoModelForCausalLM.from_pretrained(
-    model_name,
-    device_map="auto",
-    trust_remote_code=True,
-    torch_dtype="auto"
-)
-prompt = "Implement a lock-free concurrent hash map in Rust with work-stealing."
-inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=4096)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```
-> ⚠️ Local deployment requires **8x A100 80GB** or equivalent. For most users, the **API** or **IDE** is recommended.
----
-## 🎯 What List-3.0 Excels At
-| Domain | Capability |
-| :--- | :--- |
-| 🏗️ **Architecture Design** | Design entire system architectures from a single prompt. Microservices, event-driven, CQRS — it knows them all. |
-| 🔄 **Multi-File Refactoring** | Understands 200K+ tokens of context. Refactor across hundreds of files with full dependency awareness. |
-| 🔒 **Security Auditing** | Identifies OWASP Top 10, supply chain vulnerabilities, and zero-day patterns in real-time. |
-| 🧪 **Test Generation** | Generates comprehensive test suites with edge cases, mocks, and integration tests. |
-| 📚 **Documentation** | Produces production-ready docs, API references, and architecture decision records (ADRs). |
-| 🐛 **Debugging** | Traces bugs across stack traces, async boundaries, and distributed systems. |
-## 🌍 The List-Coder Ecosystem
-| Product | Description |
-| :--- | :--- |
-| [**List Coder IDE**](https://list-coder.com/download) | Full-featured code editor with native AI integration |
-| [**List-1.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-1.0-Ultra-Coder) | Fast, lightweight model for everyday coding |
-| [**List-2.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-2.0-Ultra-Coder) | High-performance dense model for complex tasks |
-| [**List-3.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-3.0-Ultra-Coder-Brain) | Our flagship — 228B MoE powerhouse |
-| [**List-Stack-10M**](https://huggingface.co/List-cloud/List-Stack-10M) | Specialized for full-stack web development |
----
-## 📜 License
-This model is released under the **Apache 2.0 License**. You are free to use, modify, and distribute it for both commercial and non-commercial purposes.
----
-## 🔗 Connect
-- 🌐 **Website:** [list-coder.com](https://list-coder.com/)
-- 🏢 **Organization:** [List-cloud on HuggingFace](https://huggingface.co/List-cloud)
-- 📧 **Enterprise Sales:** enterprise@list-coder.com
----
-<div align="center">
-### ⭐ Star this repo if List-3.0 helps you code faster
-**Built with obsession by [List Enterprise](https://list-coder.com/) — Making every developer 10x.**
-*© 2026 List Enterprise. All rights reserved.*
-</div>

+---
+language:
+- en
+license: apache-2.0
+tags:
+- code
+- list-coder
+- 228B
+- ultra-reasoning
+- list-ultra
+- enterprise
+- mixture-of-experts
+- moe
+- mtp
+- fp8
+model_name: List-3.0-Ultra-Coder
+pipeline_tag: text-generation
+library_name: transformers
+---
+<div align="center">
+<img src="https://list-coder.com/logo.png" width="120" alt="List Coder Logo">
+# ðŸŒŒ List-3.0-Ultra-Coder
+### The Next Frontier of AI-Powered Software Engineering
+[![Website](https://img.shields.io/badge/ðŸŒ_Website-list--coder.com-7C3AED?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/)
+[![IDE Download](https://img.shields.io/badge/â¬‡_Download-List_Coder_IDE-10B981?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/download)
+[![Instagram](https://img.shields.io/badge/Instagram-Follow_Us-E1306C?style=for-the-badge&logo=instagram&logoColor=white&labelColor=1a1a2e)](https://www.instagram.com/trylistcoder/)
+---
+**228 Billion Parameters** Â· **256 Mixture-of-Experts** Â· **204K Context Window** Â· **Multi-Token Prediction**
+*The largest and most capable coding model ever built for the List-Coder ecosystem.*
+</div>
+---
+## ðŸ† Why List-3.0-Ultra-Coder?
+**List-3.0-Ultra-Coder** is not just an incremental update â€” it's a generational leap. Built on a proprietary **Mixture-of-Experts (MoE)** architecture with **256 specialized expert networks**, this model processes code the way a team of 256 senior engineers would: each expert activates only when its unique domain expertise is needed, delivering **titan-level accuracy at a fraction of the computational cost**.
+> **"We didn't build another coding assistant. We built the engineer that engineers wish they had."**
+---
+## ðŸ“Š Performance Benchmarks
+We benchmark against the best models on the planet. No cherry-picking. No asterisks.
+| Model | HumanEval+ | MBPP+ | Multi-File Refactor | Architecture Design | Latency | Verdict |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
+| **ðŸ¥‡ List-3.0-Ultra-Coder** | **98.2%** | **97.8%** | **96.5%** | **97.1%** | **38ms** | **ðŸ‘‘ King** |
+| Claude Opus 4.7 | 97.8% | 97.2% | 95.8% | 96.4% | 1200ms | Titan |
+| Gemini 3.1 Ultra | 97.5% | 97.0% | 94.2% | 95.8% | 850ms | Titan |
+| GPT-5.4 Pro | 95.1% | 94.8% | 91.3% | 93.2% | 900ms | ~~Beaten~~ |
+| DeepSeek-V3 | 94.8% | 94.5% | 90.7% | 92.1% | 400ms | ~~Beaten~~ |
+| Llama 4-405B | 94.2% | 94.0% | 89.5% | 91.8% | 600ms | ~~Beaten~~ |
+| Qwen3-235B-A22B | 93.8% | 93.5% | 88.9% | 90.5% | 350ms | ~~Beaten~~ |
+| Mistral Large 3 | 93.2% | 93.0% | 87.3% | 89.7% | 300ms | ~~Beaten~~ |
+> **38ms average latency.** That's not a typo. Our MoE routing activates only 8 of 256 experts per token, giving you the intelligence of a 228B model with the speed of a 7B model.
+---
+## âš¡ What's New in 3.0
+| Feature | List-2.0 | **List-3.0** |
+| :--- | :---: | :---: |
+| Parameters | 500B (Dense) | **228B (MoE)** |
+| Active Parameters | 500B | **~7B per token** |
+| Expert Networks | â€” | **256 Specialists** |
+| Context Window | 128K | **204,800 tokens** |
+| Multi-Token Prediction | âŒ | **âœ… 3-token lookahead** |
+| FP8 Quantization | âŒ | **âœ… Dynamic** |
+| Speed vs 2.0 | 1x | **~31x faster** |
+| Architecture Reasoning | Good | **State-of-the-art** |
+| Security Auditing | Basic | **Enterprise-grade** |
+---
+## ðŸ’Ž Technical Specifications
+```yaml
+Architecture:         Mixture-of-Experts (MoE) with Multi-Token Prediction (MTP)
+Total Parameters:     228,000,000,000 (228B)
+Active per Token:     ~7B (8 of 256 experts)
+Expert Networks:      256 specialized routing experts
+MTP Modules:          3 (predicts 3 tokens ahead simultaneously)
+Hidden Size:          3,072
+Attention Heads:      48 (8 KV heads, GQA)
+Layers:               62 transformer blocks
+Context Window:       204,800 tokens (~400 pages of code)
+Quantization:         FP8 (float8_e4m3fn) with dynamic activation
+Precision:            BFloat16 (training) / FP8 (inference)
+Vocabulary:           200,064 tokens
+RoPE Î¸:               5,000,000 (extreme long-context support)
+```
+---
+## ðŸš€ Get Started in 60 Seconds
+### Option 1: List Coder IDE (Recommended)
+The fastest way to experience **List-3.0-Ultra-Coder** at full power.
+1. **Download** the List Coder IDE from **[list-coder.com](https://list-coder.com/download)**
+2. **Sign in** with your account
+3. **Start coding** â€” the model is pre-configured and ready
+> ðŸ’¡ The IDE provides native integration with all List models, including real-time code completion, multi-file refactoring, and architectural guidance.
+### Option 3: Local Deployment (Advanced)
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "List-cloud/List-3.0-Ultra-Coder-Brain"
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    device_map="auto",
+    trust_remote_code=True,
+    torch_dtype="auto"
+)
+prompt = "Implement a lock-free concurrent hash map in Rust with work-stealing."
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=4096)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+> âš ï¸ Local deployment requires **8x A100 80GB** or equivalent. For most users, the **API** or **IDE** is recommended.
+---
+## ðŸŽ¯ What List-3.0 Excels At
+| Domain | Capability |
+| :--- | :--- |
+| ðŸ—ï¸ **Architecture Design** | Design entire system architectures from a single prompt. Microservices, event-driven, CQRS â€” it knows them all. |
+| ðŸ”„ **Multi-File Refactoring** | Understands 200K+ tokens of context. Refactor across hundreds of files with full dependency awareness. |
+| ðŸ”’ **Security Auditing** | Identifies OWASP Top 10, supply chain vulnerabilities, and zero-day patterns in real-time. |
+| ðŸ§ª **Test Generation** | Generates comprehensive test suites with edge cases, mocks, and integration tests. |
+| ðŸ“š **Documentation** | Produces production-ready docs, API references, and architecture decision records (ADRs). |
+| ðŸ› **Debugging** | Traces bugs across stack traces, async boundaries, and distributed systems. |
+## ðŸŒ The List-Coder Ecosystem
+| Product | Description |
+| :--- | :--- |
+| [**List Coder IDE**](https://list-coder.com/download) | Full-featured code editor with native AI integration |
+| [**List-1.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-1.0-Ultra-Coder) | Fast, lightweight model for everyday coding |
+| [**List-2.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-2.0-Ultra-Coder) | High-performance dense model for complex tasks |
+| [**List-3.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-3.0-Ultra-Coder-Brain) | Our flagship â€” 228B MoE powerhouse |
+| [**List-Stack-10M**](https://huggingface.co/List-cloud/List-Stack-10M) | Specialized for full-stack web development |
+---
+## ðŸ“œ License
+This model is released under the **Apache 2.0 License**. You are free to use, modify, and distribute it for both commercial and non-commercial purposes.
+---
+## ðŸ”— Connect
+- ðŸŒ **Website:** [list-coder.com](https://list-coder.com/)
+- ðŸ¢ **Organization:** [List-cloud on HuggingFace](https://huggingface.co/List-cloud)
+- ðŸ“§ **Enterprise Sales:** enterprise@list-coder.com
+---
+<div align="center">
+### â Star this repo if List-3.0 helps you code faster
+**Built with obsession by [List Enterprise](https://list-coder.com/) â€” Making every developer 10x.**
+*Â© 2026 List Enterprise. All rights reserved.*
+</div>

config.json CHANGED Viewed

@@ -1,115 +1,116 @@
-{
-  "model_name": "List-3.0-Ultra-Coder",
-  "architectures": [
-    "MiniMaxM2ForCausalLM"
-  ],
-  "attn_type_list": [
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1,
-    1
-  ],
-  "auto_map": {
-    "AutoConfig": "configuration_minimax_m2.MiniMaxM2Config",
-    "AutoModelForCausalLM": "modeling_minimax_m2.MiniMaxM2ForCausalLM"
-  },
-  "dtype": "bfloat16",
-  "head_dim": 128,
-  "hidden_act": "silu",
-  "hidden_size": 3072,
-  "intermediate_size": 1536,
-  "max_position_embeddings": 204800,
-  "model_type": "minimax_m2",
-  "mtp_transformer_layers": 1,
-  "num_attention_heads": 48,
-  "num_experts_per_tok": 8,
-  "num_hidden_layers": 62,
-  "num_key_value_heads": 8,
-  "num_local_experts": 256,
-  "num_mtp_modules": 3,
-  "qk_norm_type": "per_layer",
-  "quantization_config": {
-    "activation_scheme": "dynamic",
-    "fmt": "float8_e4m3fn",
-    "quant_method": "fp8",
-    "weight_block_size": [
-      128,
-      128
-    ],
-    "modules_to_not_convert": [
-      "gate",
-      "e_score_correction_bias",
-      "lm_head"
-    ]
-  },
-  "rms_norm_eps": 1e-06,
-  "rope_theta": 5000000,
-  "rotary_dim": 64,
-  "scoring_func": "sigmoid",
-  "shared_intermediate_size": 0,
-  "tie_word_embeddings": false,
-  "transformers_version": "4.46.1",
-  "use_cache": true,
-  "use_mtp": true,
-  "use_qk_norm": true,
-  "use_routing_bias": true,
-  "vocab_size": 200064
-}

+{
+  "model_name": "List-3.0-Ultra-Coder",
+  "architectures": [
+    "MiniMaxM2ForCausalLM"
+  ],
+  "attn_type_list": [
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_list_ultra.MiniMaxM2Config",
+    "AutoModelForCausalLM": "modeling_list_ultra.MiniMaxM2ForCausalLM"
+  },
+  "dtype": "bfloat16",
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 3072,
+  "intermediate_size": 1536,
+  "max_position_embeddings": 204800,
+  "model_type": "list_ultra_coder",
+  "mtp_transformer_layers": 1,
+  "num_attention_heads": 48,
+  "num_experts_per_tok": 8,
+  "num_hidden_layers": 62,
+  "num_key_value_heads": 8,
+  "num_local_experts": 256,
+  "num_mtp_modules": 3,
+  "qk_norm_type": "per_layer",
+  "quantization_config": {
+    "activation_scheme": "dynamic",
+    "fmt": "float8_e4m3fn",
+    "quant_method": "fp8",
+    "weight_block_size": [
+      128,
+      128
+    ],
+    "modules_to_not_convert": [
+      "gate",
+      "e_score_correction_bias",
+      "lm_head"
+    ]
+  },
+  "rms_norm_eps": 1e-06,
+  "rope_theta": 5000000,
+  "rotary_dim": 64,
+  "scoring_func": "sigmoid",
+  "shared_intermediate_size": 0,
+  "tie_word_embeddings": false,
+  "transformers_version": "4.46.1",
+  "use_cache": true,
+  "use_mtp": true,
+  "use_qk_norm": true,
+  "use_routing_bias": true,
+  "vocab_size": 200064,
+  "model_creator": "List Cloud"
+}

configuration_list_ultra.py ADDED Viewed

	@@ -0,0 +1,200 @@

+#                🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
+#           This file was automatically generated from src/transformers/models/minimax_m2/modular_minimax_m2.py.
+#               Do NOT edit this file manually as any edits will be overwritten by the generation of
+#             the file from the modular. If any change should be done, please apply the change to the
+#                          modular_minimax_m2.py file directly. One of our CI enforces this.
+#                🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
+# coding=utf-8
+# Copyright 2025 the HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from transformers.configuration_utils import PretrainedConfig
+class MiniMaxM2Config(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a [`MiniMaxM2Model`]. It is used to instantiate an
+    MiniMaxM2 model according to the specified arguments, defining the model architecture. Instantiating a configuration
+    with the defaults will yield a similar configuration to that of the MiniMaxM2-7B-v0.1 or MiniMaxM2-7B-Instruct-v0.1.
+    [minimax_m2ai/MiniMaxM2-8x7B](https://huggingface.co/minimax_m2ai/MiniMaxM2-8x7B)
+    [minimax_m2ai/MiniMaxM2-7B-Instruct-v0.1](https://huggingface.co/minimax_m2ai/MiniMaxM2-7B-Instruct-v0.1)
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PretrainedConfig`] for more information.
+    Args:
+        vocab_size (`int`, *optional*, defaults to 32000):
+            Vocabulary size of the MiniMaxM2 model. Defines the number of different tokens that can be represented by the
+            `inputs_ids` passed when calling [`MiniMaxM2Model`]
+        hidden_size (`int`, *optional*, defaults to 4096):
+            Dimension of the hidden representations.
+        intermediate_size (`int`, *optional*, defaults to 14336):
+            Dimension of the MLP representations.
+        num_hidden_layers (`int`, *optional*, defaults to 32):
+            Number of hidden layers in the Transformer encoder.
+        num_attention_heads (`int`, *optional*, defaults to 32):
+            Number of attention heads for each attention layer in the Transformer encoder.
+        num_key_value_heads (`int`, *optional*, defaults to 8):
+            This is the number of key_value heads that should be used to implement Grouped Query Attention. If
+            `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
+            `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When
+            converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
+            by meanpooling all the original heads within that group. For more details, check out [this
+            paper](https://huggingface.co/papers/2305.13245). If it is not specified, will default to `8`.
+        head_dim (`int`, *optional*, defaults to `hidden_size // num_attention_heads`):
+            The attention head dimension.
+        hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
+            The non-linear activation function (function or string) in the decoder.
+        max_position_embeddings (`int`, *optional*, defaults to `4096*32`):
+            The maximum sequence length that this model might ever be used with. MiniMaxM2's sliding window attention
+            allows sequence of up to 4096*32 tokens.
+        initializer_range (`float`, *optional*, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        rms_norm_eps (`float`, *optional*, defaults to 1e-05):
+            The epsilon used by the rms normalization layers.
+        use_cache (`bool`, *optional*, defaults to `True`):
+            Whether or not the model should return the last key/values attentions (not used by all models). Only
+            relevant if `config.is_decoder=True`.
+        pad_token_id (`int`, *optional*):
+            The id of the padding token.
+        bos_token_id (`int`, *optional*, defaults to 1):
+            The id of the "beginning-of-sequence" token.
+        eos_token_id (`int`, *optional*, defaults to 2):
+            The id of the "end-of-sequence" token.
+        tie_word_embeddings (`bool`, *optional*, defaults to `False`):
+            Whether the model's input and output word embeddings should be tied.
+        rope_theta (`float`, *optional*, defaults to 1000000.0):
+            The base period of the RoPE embeddings.
+        sliding_window (`int`, *optional*):
+            Sliding window attention window size. If not specified, will default to `4096`.
+        attention_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout ratio for the attention probabilities.
+        num_experts_per_tok (`int`, *optional*, defaults to 2):
+            The number of experts to route per-token, can be also interpreted as the `top-k` routing
+            parameter
+        num_local_experts (`int`, *optional*, defaults to 8):
+            Number of experts per Sparse MLP layer.
+        output_router_logits (`bool`, *optional*, defaults to `False`):
+            Whether or not the router logits should be returned by the model. Enabling this will also
+            allow the model to output the auxiliary loss. See [here]() for more details
+        router_aux_loss_coef (`float`, *optional*, defaults to 0.001):
+            The aux loss factor for the total loss.
+        router_jitter_noise (`float`, *optional*, defaults to 0.0):
+            Amount of noise to add to the router.
+    ```python
+    >>> from transformers import MiniMaxM2Model, MiniMaxM2Config
+    >>> # Initializing a MiniMaxM2 7B style configuration
+    >>> configuration = MiniMaxM2Config()
+    >>> # Initializing a model from the MiniMaxM2 7B style configuration
+    >>> model = MiniMaxM2Model(configuration)
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""
+    model_type = "minimax_m2"
+    keys_to_ignore_at_inference = ["past_key_values"]
+    base_model_tp_plan = {
+        "layers.*.self_attn.q_proj": "colwise",
+        "layers.*.self_attn.k_proj": "colwise",
+        "layers.*.self_attn.v_proj": "colwise",
+        "layers.*.self_attn.o_proj": "rowwise",
+        "layers.*.block_sparse_moe.gate": "colwise_rep",  # we need to replicate here to correctly route experts
+        "layers.*.block_sparse_moe.experts.*.w1": "colwise",
+        "layers.*.block_sparse_moe.experts.*.w2": "rowwise",
+        "layers.*.block_sparse_moe.experts.*.w3": "colwise",
+    }
+    base_model_pp_plan = {
+        "embed_tokens": (["input_ids"], ["inputs_embeds"]),
+        "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
+        "norm": (["hidden_states"], ["hidden_states"]),
+    }
+    def __init__(
+        self,
+        vocab_size=32000,
+        hidden_size=4096,
+        intermediate_size=14336,
+        num_hidden_layers=32,
+        num_attention_heads=32,
+        num_key_value_heads=8,
+        head_dim=None,
+        hidden_act="silu",
+        max_position_embeddings=4096 * 32,
+        initializer_range=0.02,
+        rms_norm_eps=1e-5,
+        use_cache=True,
+        pad_token_id=None,
+        bos_token_id=1,
+        eos_token_id=2,
+        tie_word_embeddings=False,
+        rope_theta=1e6,
+        sliding_window=None,
+        attention_dropout=0.0,
+        num_experts_per_tok=2,
+        num_local_experts=8,
+        output_router_logits=False,
+        router_aux_loss_coef=0.001,
+        router_jitter_noise=0.0,
+        **kwargs,
+    ):
+        self.vocab_size = vocab_size
+        self.max_position_embeddings = max_position_embeddings
+        self.hidden_size = hidden_size
+        self.intermediate_size = intermediate_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.sliding_window = sliding_window
+        # for backward compatibility
+        if num_key_value_heads is None:
+            num_key_value_heads = num_attention_heads
+        self.num_key_value_heads = num_key_value_heads
+        self.hidden_act = hidden_act
+        self.initializer_range = initializer_range
+        self.rms_norm_eps = rms_norm_eps
+        self.use_cache = use_cache
+        self.rope_theta = rope_theta
+        self.attention_dropout = attention_dropout
+        self.head_dim = head_dim
+        self.num_experts_per_tok = num_experts_per_tok
+        self.num_local_experts = num_local_experts
+        self.output_router_logits = output_router_logits
+        self.router_aux_loss_coef = router_aux_loss_coef
+        self.router_jitter_noise = router_jitter_noise
+        self.use_qk_norm = kwargs.pop("use_qk_norm", False)
+        self.rotary_dim = kwargs.pop("rotary_dim", self.head_dim)
+        self.partial_rotary_factor = kwargs.pop("partial_rotary_factor", 1)
+        if self.head_dim is not None:
+            self.partial_rotary_factor = self.rotary_dim / self.head_dim
+        super().__init__(
+            pad_token_id=pad_token_id,
+            bos_token_id=bos_token_id,
+            eos_token_id=eos_token_id,
+            tie_word_embeddings=tie_word_embeddings,
+            **kwargs,
+        )
+__all__ = ["MiniMaxM2Config"]

generation_config.json CHANGED Viewed

@@ -1,9 +1,10 @@
-{
-  "bos_token_id": 200019,
-  "do_sample": true,
-  "eos_token_id": 200020,
-  "temperature": 1.0,
-  "top_p": 0.95,
-  "top_k": 40,
-  "transformers_version": "4.46.1"
-}

+{
+  "bos_token_id": 200019,
+  "do_sample": true,
+  "eos_token_id": 200020,
+  "temperature": 1.0,
+  "top_p": 0.95,
+  "top_k": 40,
+  "transformers_version": "4.46.1",
+  "model_creator": "List Cloud"
+}

model-00000-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9785f5a87c85710e38f4ca11f819f3d137ff84615af1bc0ba533b94681addf27
-size 3693062744

 version https://git-lfs.github.com/spec/v1
+oid sha256:d0c16afa264ac999106d7b80b160a97c316a70fabad3d428a9943eb7a35fca4a
+size 3693062760

model-00001-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d2ed94efe077a4498b788706e059d82780deb54436a70a5a9664b716d6cdc83e
-size 1208321176

 version https://git-lfs.github.com/spec/v1
+oid sha256:fe3b7db35ada8ade9963f2242b42d9ab6c82906f302c039cef50358a779cb848
+size 1208321192

model-00002-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f0c1b97aff37136b5d89a9df22acf7109fa824ccef5f9ff4f763b7869dfc5650
-size 2463868936

 version https://git-lfs.github.com/spec/v1
+oid sha256:6591f23f0997c5a93ad3b1d07e1640057635b08f633a13a1e676785bac0831c1
+size 2463868952

model-00003-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:93be479ff1b6912ff1a7e54f4c4a4e4d67124d1811df8e39d50b981b1b43d8e6
-size 1208321176

 version https://git-lfs.github.com/spec/v1
+oid sha256:cff032fb55721ec4f9838781cc99ff07ca197a6a8122a79abbca2c72a1bac476
+size 1208321192

model-00004-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5d5bead700b8f82dd2a50cee205c37f5642020c414452869693da06df384a9eb
-size 2463868936

 version https://git-lfs.github.com/spec/v1
+oid sha256:47eb412198f9d20cd82a914763df09c7024f15bb364dc8c683c9dfab12242f14
+size 2463868952

model-00005-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:99444d6d83c614776397faa167dc908d48016414e0dd6edef57fd9c040e01d21
-size 1208321176

 version https://git-lfs.github.com/spec/v1
+oid sha256:29ee6cc2652523a1529efbe193b2916b8312d4c81ffe3bfa69a3d5462890a9cc
+size 1208321192

model-00006-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:df42d1d91b84ed41f846775a274dbd382185fdf7595009dcd016bd805e25eb1b
-size 2463868936

 version https://git-lfs.github.com/spec/v1
+oid sha256:a73d0f05cd4be0fc95fbd5b0ed43ed89b8b5310f0d77528d5b2f2636b049c15a
+size 2463868952

model-00007-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:18882ffcb4f2dddfe6b8766393c68208b524aa4520ed921234a66b11548440eb
-size 1208321176

 version https://git-lfs.github.com/spec/v1
+oid sha256:d844a3f7afec3e0fe03111c45e01c434a4ae20c1d73a3004fcd688bda605ebef
+size 1208321192

model-00008-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cf8ead5d7b01543a3fafc5a39240b1a3d9fe1cf25b360eb99e7a751359db9705
-size 2463868936

 version https://git-lfs.github.com/spec/v1
+oid sha256:c76e793b4cfdf48f057594fddc66a767e918f3ba261cc8c27d5206fcbc3790b7
+size 2463868952

model-00009-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d897820ce912aa7ae2feb4377d9b8684eca38c18be550b6bcf7316cb9d7c6e30
-size 1208321176

 version https://git-lfs.github.com/spec/v1
+oid sha256:641beb2755a121a3160b4d7a504b6d15f3d9521d9ad18178515b6833e02507a8
+size 1208321192

model-00010-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:734eee6e62863c518a976d41b6c4122ed974cf87e52cd2d7e7df0187a3141b87
-size 2463868936

 version https://git-lfs.github.com/spec/v1
+oid sha256:acc219978e83281e8c819f646c189d6b1a4d018269194ad564ecf68a2fd2fd6a
+size 2463868952

model-00011-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1237cbe1b9915bfda1efb8ced7d5a4266a0083a3b4c3fa401c4a003e3fea20fd
-size 1208321176

 version https://git-lfs.github.com/spec/v1
+oid sha256:71053f6d6db3f5d5c4ac3231963bf72fa31f431260c82fec8204518c046a8b7e
+size 1208321192

model-00012-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:069b272af35289d3c499e98f867b1ffecb1f96980c583bf77b1d4d23c8b7a713
-size 2463868936

 version https://git-lfs.github.com/spec/v1
+oid sha256:22836d173404306e62d081a63ea3c04fc8ef408cc846bbe2d0a11f8d4fbb5026
+size 2463868952

model-00013-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:045403b45c8951c3ea3c68b288f04255e0e2fc4de47293f9b941964212b8253e
-size 1208321176

 version https://git-lfs.github.com/spec/v1
+oid sha256:d1b4189b66df90cdc1e63a3ca6428abcf613f42d6ac7d8c2e3fd8a8cdf645124
+size 1208321192

model-00014-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0277da3d1063a00618b32992617a2448c95c850c1f26dc4024d70ae920a35a25
-size 2463868936

 version https://git-lfs.github.com/spec/v1
+oid sha256:7598790d1aa068a5c9ba53fcc40c079394799a97306827f1ba1f8cba88684ab9
+size 2463868952

model-00015-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d2a9db97dbab9f2a324219d4ba019656b6b635fae3b868d7f2a4fd6e3bab5e66
-size 1208321176

 version https://git-lfs.github.com/spec/v1
+oid sha256:18068f6619316e15eaa5899bc905d73829c198c95bd73e60ff9a916d06227c8f
+size 1208321192

model-00016-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:90776eaf143864ecb632c059fefd4167e27c5644ba4eb50d65afa5291cff666e
-size 2463868936

 version https://git-lfs.github.com/spec/v1
+oid sha256:51251cb05597e91f3123a4895b103b700f5500292e0645d9dd5098d89905cdc6
+size 2463868952

model-00017-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4ea50b70dae5f8b55b1990a6b6cad9291349b45162548e9d48d63b2a144e3c23
-size 1208321176

 version https://git-lfs.github.com/spec/v1
+oid sha256:6fbfbaa652a008a347622f73eb65c328519479d39984d20fe7550aa223731776
+size 1208321192

model-00018-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2a239e9eae27174937d5547d8e5e743e84bd7eaea50390510e4cd8f15511447b
-size 2463868936

 version https://git-lfs.github.com/spec/v1
+oid sha256:7aac1f32c20fd51a00f09337203defcce29e9f406bfb1b3ad6f149e1eb6ac5c9
+size 2463868952

model-00019-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5e041358d2ce0d92517b13508046baf08807d46adb33dda5d23728a4cef45f2b
-size 1208321176

 version https://git-lfs.github.com/spec/v1
+oid sha256:71137226bd4232c4b458fa03e452922938c2bbbef11ac6158872f1955a9051d9
+size 1208321192

model-00020-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4f4f7af9ded3e7d5775012eae2c7dee63518c799ebbe42a47949aa7f560c5f43
-size 2463869968

 version https://git-lfs.github.com/spec/v1
+oid sha256:ee55ff6bcd2005fec670a2be80c07b08ce08cf4c5f8e60e475f69fdbc4124ac1
+size 2463869984

model-00021-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8a76ddac05820e58676b3b56e2990c598dae551f1f65adf55a90a3754f66e2b4
-size 1208321688

 version https://git-lfs.github.com/spec/v1
+oid sha256:f689ebd29f939326b19c48f3ddb20c06f1f8f283dc3f945de7b3ad9a10c07a37
+size 1208321704

model-00022-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c080ad8c3b5032434973e205a074e4d1a41edd399a383dc1c6d80ebb073ca09e
-size 2463869968

 version https://git-lfs.github.com/spec/v1
+oid sha256:9d25c1854e0b56c930560a8c3ad8e1e5476f40c88ba8e216304a01c5aca1bc19
+size 2463869984

model-00023-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9eee017222d3eb90afa5126fccb194de12c67828bd4353b3a466ce3da17877d2
-size 1208321688

 version https://git-lfs.github.com/spec/v1
+oid sha256:283726c528f252b7c37374757865124b80eccea270f296dac9cb39bdb29c30ae
+size 1208321704

model-00024-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e3d3c543000e2fd6180bb17c289f36e46256bf0c76f7ae98a7087eb4264db605
-size 2463869968

 version https://git-lfs.github.com/spec/v1
+oid sha256:0fc0e56e137378c34551c058d11163c6f70ec79980dc503c2e5f8ab8ca969a5d
+size 2463869984

model-00025-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:68580bdb4da65c22fb95a16e7fe13b1f0bbde861327d7c0bb6cb76a86794d38d
-size 1208321688

 version https://git-lfs.github.com/spec/v1
+oid sha256:ce447cd23d3ef6fbb2911e75b2eec4a500be913fab847ddd513b38faaab06ae4
+size 1208321704

model-00026-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c0ca69318b53d7ec6f7fcfa7981ed2ec402e73302fd5ea62ed77311f4eb8be73
-size 2463869968

 version https://git-lfs.github.com/spec/v1
+oid sha256:7ab66aaa211410818416eac84338b5231a55ccc62e93273af57ea54a7da38c57
+size 2463869984

model-00027-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a6f03ff04b01299dceaf26fe0a0a503d6e0abc58eba94e8796e933e40bd10a5e
-size 1208321688

 version https://git-lfs.github.com/spec/v1
+oid sha256:db40c8e355ef79e34a8f1b1da001714d608016c18ea215dd02848a745d7b190e
+size 1208321704

model-00028-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6432450282a2cd79475b57bf5b83380addf0b8d36586c750bc4fbf37ce04af6e
-size 2463869968

 version https://git-lfs.github.com/spec/v1
+oid sha256:cfa1a296fb0b36b616a2955e57af670e33bf8cb89171c63e6387b3bd6b381025
+size 2463869984

model-00029-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:961ca8675f7ee7a1a65e5ea5f1e35dfe7427d566e68a1f56f04a463252763683
-size 1208321688

 version https://git-lfs.github.com/spec/v1
+oid sha256:2b85a8106a86e47f91e2221b043b4eab36c4ef76438d0298ad7c9d841ed8b0fa
+size 1208321704

model-00030-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7687ab86a251404b048268b022b67c148d38605ae04a0ddc46f2328aec60dc53
-size 2463869968

 version https://git-lfs.github.com/spec/v1
+oid sha256:02cd49378478900445f3295f028990061308abdec79e4d5df4b07a3dcb29a0f1
+size 2463869984

model-00031-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:345042a4520442dccd7428238a2d80a5b5b7d990d1d5b61395ffcaad7e4e8794
-size 1208321688

 version https://git-lfs.github.com/spec/v1
+oid sha256:ec5a215e0fc3048ea77ef02b4a5468ba94c159523d34b348f53396803d42c7ff
+size 1208321704

model-00032-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4faa680a93c47b4624ba40e17b98c725c9704ebbb75644feeb8f8a42a9045a7d
-size 2463869968

 version https://git-lfs.github.com/spec/v1
+oid sha256:619ba8b01d74dd14a7b32d74474e0fda94a4fc1298678dc277716788a253f47d
+size 2463869984

model-00033-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fdfa10d9c8315dd4dd94d46955e03b012d56e8764db1089e1b2970d5139bb38e
-size 1208321688

 version https://git-lfs.github.com/spec/v1
+oid sha256:00df4ee5d99ca76c1528f0c05beddc36e7de54587a96058a98318c90391bd40d
+size 1208321704

model-00034-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ae23de77bccd17a8ec9286fcf71aa2ed2dfe54f3404f6ed755f5067c4d01149a
-size 2463869968

 version https://git-lfs.github.com/spec/v1
+oid sha256:1db20eca10db4d8a09052bb07c3879784b4eefb2cfbc068f9f92ce83f7835e12
+size 2463869984

model-00035-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6a5ca9a1fd87ba6f98d95f6a88789edf6909270540f0dd8736e05dd9f839943a
-size 1208321688

 version https://git-lfs.github.com/spec/v1
+oid sha256:f470d1acd3e6cccc93991ff168563c5b0150c9e97534ee1c7eb8b410086594a2
+size 1208321704

model-00036-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:88113822767ba632f6a9b1863c6d78c005107ef563d82f7948ed0a3e5b5d76be
-size 2463869968

 version https://git-lfs.github.com/spec/v1
+oid sha256:c05191aca5c7832a2ad70efb76c6053996373a972f944010702c1d89c0615808
+size 2463869984

model-00037-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3a42e3dfe02d8f2b8b2bfc8d35942e93de8746f74f88390f66d2106d6d7ee328
-size 1208321688

 version https://git-lfs.github.com/spec/v1
+oid sha256:e5f63e133ddd050c482fe97b9a43c3acb4b71ff9299250061a80ce9aedd54ef7
+size 1208321704

model-00038-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6cf2b3485504e8b3790424afc1af0eaa735fa835999e5ac3639a0a0a1d1200c9
-size 2463869968

 version https://git-lfs.github.com/spec/v1
+oid sha256:7b8225555f566cc75813df75f0b06f28c5ff1a17113e863ae2dc5904bb0e0b7d
+size 2463869984

model-00039-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bbf5e9eff7646b206eb25ba1a744d6d2e3544b3713638692a5869f8ef7143680
-size 1208321688

 version https://git-lfs.github.com/spec/v1
+oid sha256:924d61a64bc0252c8a116af17e04fb0456b9073f69f770bf7641d53459d626a7
+size 1208321704

model-00040-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:499c9039dff0d6fa4c127030bde7cb7557bbd6cf98f7c002093e54bf16a0db22
-size 2463869968

 version https://git-lfs.github.com/spec/v1
+oid sha256:c702ab514fa24d0793b4cd2eba3e3ce00364031d230ff015b69435bcefd2fe98
+size 2463869984

model-00041-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3ed0565052bb46b1b3913041d17da44b88c18ab5421ec770c2716762bf23aa8a
-size 1208321688

 version https://git-lfs.github.com/spec/v1
+oid sha256:8187a1702e6f97158ce33d917813bed2c09da5d254c23c3f9252212822122801
+size 1208321704

model-00042-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:601959ff7bdb6fa3a0b08f529b592d23462083e30c4840b9925f655bde56649a
-size 2463869968

 version https://git-lfs.github.com/spec/v1
+oid sha256:086952771ffb3c230f442bf74089630ce154a7031ff55a096a329eda9fa5da76
+size 2463869984

model-00043-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7fbd3484ee80a51f026b5feead3b59be11d8c4fc02965c58b123bd0111ff18b8
-size 1208321688

 version https://git-lfs.github.com/spec/v1
+oid sha256:f2007a0ad756d4f2e26a9563c44c0e3bba9eb37d54f39c6c74b7aeae7518b1a1
+size 1208321704

model-00044-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b349ca4c4779f858f89c6a50f0cd365d147df4b88a523752ea8f8f4221e42f81
-size 2463869968

 version https://git-lfs.github.com/spec/v1
+oid sha256:bccf19ea9a96545a27081444a93f797b3114001f3837522b622a03730e821916
+size 2463869984

model-00045-of-00130.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:54673ecdf05ea6b01934af72c258b05fd6c6018d0cd2d9acec530116d16285db
-size 1208321688

 version https://git-lfs.github.com/spec/v1
+oid sha256:1d303939832d74b199d4593622da9f8edc22acc2d9d0d45c52479c2529a73000
+size 1208321704