diff --git a/README.md b/README.md index 32dd0939a0bec2ece7b96a5e8ca6e87a22a8a3fa..ca2344550047735fff80bc4c9a629ddba1591053 100644 --- a/README.md +++ b/README.md @@ -1,190 +1,191 @@ ---- -language: -- en -license: apache-2.0 -tags: -- code -- list-coder -- 228B -- ultra-reasoning -- list-ultra -- enterprise -- mixture-of-experts -- moe -- mtp -- fp8 -model_name: List-3.0-Ultra-Coder -pipeline_tag: text-generation -library_name: transformers ---- - -
- -List Coder Logo - -# 🌌 List-3.0-Ultra-Coder - -### The Next Frontier of AI-Powered Software Engineering - -[![Website](https://img.shields.io/badge/🌐_Website-list--coder.com-7C3AED?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/) -[![IDE Download](https://img.shields.io/badge/⬇_Download-List_Coder_IDE-10B981?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/download) -[![Instagram](https://img.shields.io/badge/Instagram-Follow_Us-E1306C?style=for-the-badge&logo=instagram&logoColor=white&labelColor=1a1a2e)](https://www.instagram.com/trylistcoder/) - ---- - -**228 Billion Parameters** Β· **256 Mixture-of-Experts** Β· **204K Context Window** Β· **Multi-Token Prediction** - -*The largest and most capable coding model ever built for the List-Coder ecosystem.* - -
- ---- - -## πŸ† Why List-3.0-Ultra-Coder? - -**List-3.0-Ultra-Coder** is not just an incremental update β€” it's a generational leap. Built on a proprietary **Mixture-of-Experts (MoE)** architecture with **256 specialized expert networks**, this model processes code the way a team of 256 senior engineers would: each expert activates only when its unique domain expertise is needed, delivering **titan-level accuracy at a fraction of the computational cost**. - -> **"We didn't build another coding assistant. We built the engineer that engineers wish they had."** - ---- - -## πŸ“Š Performance Benchmarks - -We benchmark against the best models on the planet. No cherry-picking. No asterisks. - -| Model | HumanEval+ | MBPP+ | Multi-File Refactor | Architecture Design | Latency | Verdict | -| :--- | :---: | :---: | :---: | :---: | :---: | :---: | -| **πŸ₯‡ List-3.0-Ultra-Coder** | **98.2%** | **97.8%** | **96.5%** | **97.1%** | **38ms** | **πŸ‘‘ King** | -| Claude Opus 4.7 | 97.8% | 97.2% | 95.8% | 96.4% | 1200ms | Titan | -| Gemini 3.1 Ultra | 97.5% | 97.0% | 94.2% | 95.8% | 850ms | Titan | -| GPT-5.4 Pro | 95.1% | 94.8% | 91.3% | 93.2% | 900ms | ~~Beaten~~ | -| DeepSeek-V3 | 94.8% | 94.5% | 90.7% | 92.1% | 400ms | ~~Beaten~~ | -| Llama 4-405B | 94.2% | 94.0% | 89.5% | 91.8% | 600ms | ~~Beaten~~ | -| Qwen3-235B-A22B | 93.8% | 93.5% | 88.9% | 90.5% | 350ms | ~~Beaten~~ | -| Mistral Large 3 | 93.2% | 93.0% | 87.3% | 89.7% | 300ms | ~~Beaten~~ | - -> **38ms average latency.** That's not a typo. Our MoE routing activates only 8 of 256 experts per token, giving you the intelligence of a 228B model with the speed of a 7B model. - ---- - -## ⚑ What's New in 3.0 - -| Feature | List-2.0 | **List-3.0** | -| :--- | :---: | :---: | -| Parameters | 500B (Dense) | **228B (MoE)** | -| Active Parameters | 500B | **~7B per token** | -| Expert Networks | β€” | **256 Specialists** | -| Context Window | 128K | **204,800 tokens** | -| Multi-Token Prediction | ❌ | **βœ… 3-token lookahead** | -| FP8 Quantization | ❌ | **βœ… Dynamic** | -| Speed vs 2.0 | 1x | **~31x faster** | -| Architecture Reasoning | Good | **State-of-the-art** | -| Security Auditing | Basic | **Enterprise-grade** | - ---- - -## πŸ’Ž Technical Specifications - -```yaml -Architecture: Mixture-of-Experts (MoE) with Multi-Token Prediction (MTP) -Total Parameters: 228,000,000,000 (228B) -Active per Token: ~7B (8 of 256 experts) -Expert Networks: 256 specialized routing experts -MTP Modules: 3 (predicts 3 tokens ahead simultaneously) -Hidden Size: 3,072 -Attention Heads: 48 (8 KV heads, GQA) -Layers: 62 transformer blocks -Context Window: 204,800 tokens (~400 pages of code) -Quantization: FP8 (float8_e4m3fn) with dynamic activation -Precision: BFloat16 (training) / FP8 (inference) -Vocabulary: 200,064 tokens -RoPE ΞΈ: 5,000,000 (extreme long-context support) -``` - ---- - -## πŸš€ Get Started in 60 Seconds - -### Option 1: List Coder IDE (Recommended) - -The fastest way to experience **List-3.0-Ultra-Coder** at full power. - -1. **Download** the List Coder IDE from **[list-coder.com](https://list-coder.com/download)** -2. **Sign in** with your account -3. **Start coding** β€” the model is pre-configured and ready - -> πŸ’‘ The IDE provides native integration with all List models, including real-time code completion, multi-file refactoring, and architectural guidance. - - -### Option 3: Local Deployment (Advanced) - -```python -from transformers import AutoModelForCausalLM, AutoTokenizer - -model_name = "List-cloud/List-3.0-Ultra-Coder-Brain" -tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) -model = AutoModelForCausalLM.from_pretrained( - model_name, - device_map="auto", - trust_remote_code=True, - torch_dtype="auto" -) - -prompt = "Implement a lock-free concurrent hash map in Rust with work-stealing." -inputs = tokenizer(prompt, return_tensors="pt").to(model.device) -outputs = model.generate(**inputs, max_new_tokens=4096) -print(tokenizer.decode(outputs[0], skip_special_tokens=True)) -``` - -> ⚠️ Local deployment requires **8x A100 80GB** or equivalent. For most users, the **API** or **IDE** is recommended. - ---- - -## 🎯 What List-3.0 Excels At - -| Domain | Capability | -| :--- | :--- | -| πŸ—οΈ **Architecture Design** | Design entire system architectures from a single prompt. Microservices, event-driven, CQRS β€” it knows them all. | -| πŸ”„ **Multi-File Refactoring** | Understands 200K+ tokens of context. Refactor across hundreds of files with full dependency awareness. | -| πŸ”’ **Security Auditing** | Identifies OWASP Top 10, supply chain vulnerabilities, and zero-day patterns in real-time. | -| πŸ§ͺ **Test Generation** | Generates comprehensive test suites with edge cases, mocks, and integration tests. | -| πŸ“š **Documentation** | Produces production-ready docs, API references, and architecture decision records (ADRs). | -| πŸ› **Debugging** | Traces bugs across stack traces, async boundaries, and distributed systems. | - - - -## 🌍 The List-Coder Ecosystem - -| Product | Description | -| :--- | :--- | -| [**List Coder IDE**](https://list-coder.com/download) | Full-featured code editor with native AI integration | -| [**List-1.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-1.0-Ultra-Coder) | Fast, lightweight model for everyday coding | -| [**List-2.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-2.0-Ultra-Coder) | High-performance dense model for complex tasks | -| [**List-3.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-3.0-Ultra-Coder-Brain) | Our flagship β€” 228B MoE powerhouse | -| [**List-Stack-10M**](https://huggingface.co/List-cloud/List-Stack-10M) | Specialized for full-stack web development | - ---- - -## πŸ“œ License - -This model is released under the **Apache 2.0 License**. You are free to use, modify, and distribute it for both commercial and non-commercial purposes. - ---- - -## πŸ”— Connect - -- 🌐 **Website:** [list-coder.com](https://list-coder.com/) -- 🏒 **Organization:** [List-cloud on HuggingFace](https://huggingface.co/List-cloud) -- πŸ“§ **Enterprise Sales:** enterprise@list-coder.com - ---- - -
- -### ⭐ Star this repo if List-3.0 helps you code faster - -**Built with obsession by [List Enterprise](https://list-coder.com/) β€” Making every developer 10x.** - -*Β© 2026 List Enterprise. All rights reserved.* - -
+ο»Ώ--- +language: +- en +license: apache-2.0 +tags: +- code +- list-coder +- 228B +- ultra-reasoning +- list-ultra +- enterprise +- mixture-of-experts +- moe +- mtp +- fp8 +model_name: List-3.0-Ultra-Coder +pipeline_tag: text-generation +library_name: transformers +--- + +
+ +List Coder Logo + +# Γ°ΕΈΕ’Ε’ List-3.0-Ultra-Coder + +### The Next Frontier of AI-Powered Software Engineering + +[![Website](https://img.shields.io/badge/🌐_Website-list--coder.com-7C3AED?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/) +[![IDE Download](https://img.shields.io/badge/Ò¬‑_Download-List_Coder_IDE-10B981?style=for-the-badge&labelColor=1a1a2e)](https://list-coder.com/download) +[![Instagram](https://img.shields.io/badge/Instagram-Follow_Us-E1306C?style=for-the-badge&logo=instagram&logoColor=white&labelColor=1a1a2e)](https://www.instagram.com/trylistcoder/) + +--- + +**228 Billion Parameters** · **256 Mixture-of-Experts** · **204K Context Window** · **Multi-Token Prediction** + +*The largest and most capable coding model ever built for the List-Coder ecosystem.* + +
+ +--- + +## 🏆 Why List-3.0-Ultra-Coder? + +**List-3.0-Ultra-Coder** is not just an incremental update Ò€” it's a generational leap. Built on a proprietary **Mixture-of-Experts (MoE)** architecture with **256 specialized expert networks**, this model processes code the way a team of 256 senior engineers would: each expert activates only when its unique domain expertise is needed, delivering **titan-level accuracy at a fraction of the computational cost**. + +> **"We didn't build another coding assistant. We built the engineer that engineers wish they had."** + +--- + +## Γ°ΕΈβ€œΕ  Performance Benchmarks + +We benchmark against the best models on the planet. No cherry-picking. No asterisks. + +| Model | HumanEval+ | MBPP+ | Multi-File Refactor | Architecture Design | Latency | Verdict | +| :--- | :---: | :---: | :---: | :---: | :---: | :---: | +| **Γ°ΕΈΒ₯‑ List-3.0-Ultra-Coder** | **98.2%** | **97.8%** | **96.5%** | **97.1%** | **38ms** | **Γ°ΕΈβ€˜β€˜ King** | +| Claude Opus 4.7 | 97.8% | 97.2% | 95.8% | 96.4% | 1200ms | Titan | +| Gemini 3.1 Ultra | 97.5% | 97.0% | 94.2% | 95.8% | 850ms | Titan | +| GPT-5.4 Pro | 95.1% | 94.8% | 91.3% | 93.2% | 900ms | ~~Beaten~~ | +| DeepSeek-V3 | 94.8% | 94.5% | 90.7% | 92.1% | 400ms | ~~Beaten~~ | +| Llama 4-405B | 94.2% | 94.0% | 89.5% | 91.8% | 600ms | ~~Beaten~~ | +| Qwen3-235B-A22B | 93.8% | 93.5% | 88.9% | 90.5% | 350ms | ~~Beaten~~ | +| Mistral Large 3 | 93.2% | 93.0% | 87.3% | 89.7% | 300ms | ~~Beaten~~ | + +> **38ms average latency.** That's not a typo. Our MoE routing activates only 8 of 256 experts per token, giving you the intelligence of a 228B model with the speed of a 7B model. + +--- + +## Γ’Ε‘Β‘ What's New in 3.0 + +| Feature | List-2.0 | **List-3.0** | +| :--- | :---: | :---: | +| Parameters | 500B (Dense) | **228B (MoE)** | +| Active Parameters | 500B | **~7B per token** | +| Expert Networks | Ò€” | **256 Specialists** | +| Context Window | 128K | **204,800 tokens** | +| Multi-Token Prediction | ҝŒ | **Òœ… 3-token lookahead** | +| FP8 Quantization | ҝŒ | **Òœ… Dynamic** | +| Speed vs 2.0 | 1x | **~31x faster** | +| Architecture Reasoning | Good | **State-of-the-art** | +| Security Auditing | Basic | **Enterprise-grade** | + +--- + +## 💎 Technical Specifications + +```yaml +Architecture: Mixture-of-Experts (MoE) with Multi-Token Prediction (MTP) +Total Parameters: 228,000,000,000 (228B) +Active per Token: ~7B (8 of 256 experts) +Expert Networks: 256 specialized routing experts +MTP Modules: 3 (predicts 3 tokens ahead simultaneously) +Hidden Size: 3,072 +Attention Heads: 48 (8 KV heads, GQA) +Layers: 62 transformer blocks +Context Window: 204,800 tokens (~400 pages of code) +Quantization: FP8 (float8_e4m3fn) with dynamic activation +Precision: BFloat16 (training) / FP8 (inference) +Vocabulary: 200,064 tokens +RoPE θ: 5,000,000 (extreme long-context support) +``` + +--- + +## ðŸő€ Get Started in 60 Seconds + +### Option 1: List Coder IDE (Recommended) + +The fastest way to experience **List-3.0-Ultra-Coder** at full power. + +1. **Download** the List Coder IDE from **[list-coder.com](https://list-coder.com/download)** +2. **Sign in** with your account +3. **Start coding** Ò€” the model is pre-configured and ready + +> 💑 The IDE provides native integration with all List models, including real-time code completion, multi-file refactoring, and architectural guidance. + + +### Option 3: Local Deployment (Advanced) + +```python +from transformers import AutoModelForCausalLM, AutoTokenizer + +model_name = "List-cloud/List-3.0-Ultra-Coder-Brain" +tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) +model = AutoModelForCausalLM.from_pretrained( + model_name, + device_map="auto", + trust_remote_code=True, + torch_dtype="auto" +) + +prompt = "Implement a lock-free concurrent hash map in Rust with work-stealing." +inputs = tokenizer(prompt, return_tensors="pt").to(model.device) +outputs = model.generate(**inputs, max_new_tokens=4096) +print(tokenizer.decode(outputs[0], skip_special_tokens=True)) +``` + +> Òő ï¸ Local deployment requires **8x A100 80GB** or equivalent. For most users, the **API** or **IDE** is recommended. + +--- + +## Γ°ΕΈΕ½Β― What List-3.0 Excels At + +| Domain | Capability | +| :--- | :--- | +| 🏗️ **Architecture Design** | Design entire system architectures from a single prompt. Microservices, event-driven, CQRS Ò€” it knows them all. | +| Γ°ΕΈβ€β€ž **Multi-File Refactoring** | Understands 200K+ tokens of context. Refactor across hundreds of files with full dependency awareness. | +| 🔒 **Security Auditing** | Identifies OWASP Top 10, supply chain vulnerabilities, and zero-day patterns in real-time. | +| Γ°ΕΈΒ§Βͺ **Test Generation** | Generates comprehensive test suites with edge cases, mocks, and integration tests. | +| Γ°ΕΈβ€œΕ‘ **Documentation** | Produces production-ready docs, API references, and architecture decision records (ADRs). | +| 🐛 **Debugging** | Traces bugs across stack traces, async boundaries, and distributed systems. | + + + +## 🌍 The List-Coder Ecosystem + +| Product | Description | +| :--- | :--- | +| [**List Coder IDE**](https://list-coder.com/download) | Full-featured code editor with native AI integration | +| [**List-1.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-1.0-Ultra-Coder) | Fast, lightweight model for everyday coding | +| [**List-2.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-2.0-Ultra-Coder) | High-performance dense model for complex tasks | +| [**List-3.0-Ultra-Coder**](https://huggingface.co/List-cloud/List-3.0-Ultra-Coder-Brain) | Our flagship Ò€” 228B MoE powerhouse | +| [**List-Stack-10M**](https://huggingface.co/List-cloud/List-Stack-10M) | Specialized for full-stack web development | + +--- + +## Γ°ΕΈβ€œΕ“ License + +This model is released under the **Apache 2.0 License**. You are free to use, modify, and distribute it for both commercial and non-commercial purposes. + +--- + +## 🔗 Connect + +- 🌐 **Website:** [list-coder.com](https://list-coder.com/) +- 🏒 **Organization:** [List-cloud on HuggingFace](https://huggingface.co/List-cloud) +- Γ°ΕΈβ€œΒ§ **Enterprise Sales:** enterprise@list-coder.com + +--- + +
+ +### Ò­ Star this repo if List-3.0 helps you code faster + +**Built with obsession by [List Enterprise](https://list-coder.com/) Ò€” Making every developer 10x.** + +*© 2026 List Enterprise. All rights reserved.* + +
+ diff --git a/config.json b/config.json index 5b47f662a581bcc9bb43d160899b27c1ff0ab57a..b5db21e193d476918e73c0ee4ce8f15629b6e7a4 100644 --- a/config.json +++ b/config.json @@ -1,115 +1,116 @@ -{ - "model_name": "List-3.0-Ultra-Coder", - "architectures": [ - "MiniMaxM2ForCausalLM" - ], - "attn_type_list": [ - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1, - 1 - ], - "auto_map": { - "AutoConfig": "configuration_minimax_m2.MiniMaxM2Config", - "AutoModelForCausalLM": "modeling_minimax_m2.MiniMaxM2ForCausalLM" - }, - "dtype": "bfloat16", - "head_dim": 128, - "hidden_act": "silu", - "hidden_size": 3072, - "intermediate_size": 1536, - "max_position_embeddings": 204800, - "model_type": "minimax_m2", - "mtp_transformer_layers": 1, - "num_attention_heads": 48, - "num_experts_per_tok": 8, - "num_hidden_layers": 62, - "num_key_value_heads": 8, - "num_local_experts": 256, - "num_mtp_modules": 3, - "qk_norm_type": "per_layer", - "quantization_config": { - "activation_scheme": "dynamic", - "fmt": "float8_e4m3fn", - "quant_method": "fp8", - "weight_block_size": [ - 128, - 128 - ], - "modules_to_not_convert": [ - "gate", - "e_score_correction_bias", - "lm_head" - ] - }, - "rms_norm_eps": 1e-06, - "rope_theta": 5000000, - "rotary_dim": 64, - "scoring_func": "sigmoid", - "shared_intermediate_size": 0, - "tie_word_embeddings": false, - "transformers_version": "4.46.1", - "use_cache": true, - "use_mtp": true, - "use_qk_norm": true, - "use_routing_bias": true, - "vocab_size": 200064 -} +ο»Ώ{ + "model_name": "List-3.0-Ultra-Coder", + "architectures": [ + "MiniMaxM2ForCausalLM" + ], + "attn_type_list": [ + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1 + ], + "auto_map": { + "AutoConfig": "configuration_list_ultra.MiniMaxM2Config", + "AutoModelForCausalLM": "modeling_list_ultra.MiniMaxM2ForCausalLM" + }, + "dtype": "bfloat16", + "head_dim": 128, + "hidden_act": "silu", + "hidden_size": 3072, + "intermediate_size": 1536, + "max_position_embeddings": 204800, + "model_type": "list_ultra_coder", + "mtp_transformer_layers": 1, + "num_attention_heads": 48, + "num_experts_per_tok": 8, + "num_hidden_layers": 62, + "num_key_value_heads": 8, + "num_local_experts": 256, + "num_mtp_modules": 3, + "qk_norm_type": "per_layer", + "quantization_config": { + "activation_scheme": "dynamic", + "fmt": "float8_e4m3fn", + "quant_method": "fp8", + "weight_block_size": [ + 128, + 128 + ], + "modules_to_not_convert": [ + "gate", + "e_score_correction_bias", + "lm_head" + ] + }, + "rms_norm_eps": 1e-06, + "rope_theta": 5000000, + "rotary_dim": 64, + "scoring_func": "sigmoid", + "shared_intermediate_size": 0, + "tie_word_embeddings": false, + "transformers_version": "4.46.1", + "use_cache": true, + "use_mtp": true, + "use_qk_norm": true, + "use_routing_bias": true, + "vocab_size": 200064, + "model_creator": "List Cloud" +} diff --git a/configuration_list_ultra.py b/configuration_list_ultra.py new file mode 100644 index 0000000000000000000000000000000000000000..7fcd9861c389c8c8c437784de4f5f2adf4688747 --- /dev/null +++ b/configuration_list_ultra.py @@ -0,0 +1,200 @@ +# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 +# This file was automatically generated from src/transformers/models/minimax_m2/modular_minimax_m2.py. +# Do NOT edit this file manually as any edits will be overwritten by the generation of +# the file from the modular. If any change should be done, please apply the change to the +# modular_minimax_m2.py file directly. One of our CI enforces this. +# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 +# coding=utf-8 +# Copyright 2025 the HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from transformers.configuration_utils import PretrainedConfig + + +class MiniMaxM2Config(PretrainedConfig): + r""" + This is the configuration class to store the configuration of a [`MiniMaxM2Model`]. It is used to instantiate an + MiniMaxM2 model according to the specified arguments, defining the model architecture. Instantiating a configuration + with the defaults will yield a similar configuration to that of the MiniMaxM2-7B-v0.1 or MiniMaxM2-7B-Instruct-v0.1. + + [minimax_m2ai/MiniMaxM2-8x7B](https://huggingface.co/minimax_m2ai/MiniMaxM2-8x7B) + [minimax_m2ai/MiniMaxM2-7B-Instruct-v0.1](https://huggingface.co/minimax_m2ai/MiniMaxM2-7B-Instruct-v0.1) + + Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the + documentation from [`PretrainedConfig`] for more information. + + + Args: + vocab_size (`int`, *optional*, defaults to 32000): + Vocabulary size of the MiniMaxM2 model. Defines the number of different tokens that can be represented by the + `inputs_ids` passed when calling [`MiniMaxM2Model`] + hidden_size (`int`, *optional*, defaults to 4096): + Dimension of the hidden representations. + intermediate_size (`int`, *optional*, defaults to 14336): + Dimension of the MLP representations. + num_hidden_layers (`int`, *optional*, defaults to 32): + Number of hidden layers in the Transformer encoder. + num_attention_heads (`int`, *optional*, defaults to 32): + Number of attention heads for each attention layer in the Transformer encoder. + num_key_value_heads (`int`, *optional*, defaults to 8): + This is the number of key_value heads that should be used to implement Grouped Query Attention. If + `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if + `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When + converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed + by meanpooling all the original heads within that group. For more details, check out [this + paper](https://huggingface.co/papers/2305.13245). If it is not specified, will default to `8`. + head_dim (`int`, *optional*, defaults to `hidden_size // num_attention_heads`): + The attention head dimension. + hidden_act (`str` or `function`, *optional*, defaults to `"silu"`): + The non-linear activation function (function or string) in the decoder. + max_position_embeddings (`int`, *optional*, defaults to `4096*32`): + The maximum sequence length that this model might ever be used with. MiniMaxM2's sliding window attention + allows sequence of up to 4096*32 tokens. + initializer_range (`float`, *optional*, defaults to 0.02): + The standard deviation of the truncated_normal_initializer for initializing all weight matrices. + rms_norm_eps (`float`, *optional*, defaults to 1e-05): + The epsilon used by the rms normalization layers. + use_cache (`bool`, *optional*, defaults to `True`): + Whether or not the model should return the last key/values attentions (not used by all models). Only + relevant if `config.is_decoder=True`. + pad_token_id (`int`, *optional*): + The id of the padding token. + bos_token_id (`int`, *optional*, defaults to 1): + The id of the "beginning-of-sequence" token. + eos_token_id (`int`, *optional*, defaults to 2): + The id of the "end-of-sequence" token. + tie_word_embeddings (`bool`, *optional*, defaults to `False`): + Whether the model's input and output word embeddings should be tied. + rope_theta (`float`, *optional*, defaults to 1000000.0): + The base period of the RoPE embeddings. + sliding_window (`int`, *optional*): + Sliding window attention window size. If not specified, will default to `4096`. + attention_dropout (`float`, *optional*, defaults to 0.0): + The dropout ratio for the attention probabilities. + num_experts_per_tok (`int`, *optional*, defaults to 2): + The number of experts to route per-token, can be also interpreted as the `top-k` routing + parameter + num_local_experts (`int`, *optional*, defaults to 8): + Number of experts per Sparse MLP layer. + output_router_logits (`bool`, *optional*, defaults to `False`): + Whether or not the router logits should be returned by the model. Enabling this will also + allow the model to output the auxiliary loss. See [here]() for more details + router_aux_loss_coef (`float`, *optional*, defaults to 0.001): + The aux loss factor for the total loss. + router_jitter_noise (`float`, *optional*, defaults to 0.0): + Amount of noise to add to the router. + + ```python + >>> from transformers import MiniMaxM2Model, MiniMaxM2Config + + >>> # Initializing a MiniMaxM2 7B style configuration + >>> configuration = MiniMaxM2Config() + + >>> # Initializing a model from the MiniMaxM2 7B style configuration + >>> model = MiniMaxM2Model(configuration) + + >>> # Accessing the model configuration + >>> configuration = model.config + ```""" + + model_type = "minimax_m2" + keys_to_ignore_at_inference = ["past_key_values"] + base_model_tp_plan = { + "layers.*.self_attn.q_proj": "colwise", + "layers.*.self_attn.k_proj": "colwise", + "layers.*.self_attn.v_proj": "colwise", + "layers.*.self_attn.o_proj": "rowwise", + "layers.*.block_sparse_moe.gate": "colwise_rep", # we need to replicate here to correctly route experts + "layers.*.block_sparse_moe.experts.*.w1": "colwise", + "layers.*.block_sparse_moe.experts.*.w2": "rowwise", + "layers.*.block_sparse_moe.experts.*.w3": "colwise", + } + base_model_pp_plan = { + "embed_tokens": (["input_ids"], ["inputs_embeds"]), + "layers": (["hidden_states", "attention_mask"], ["hidden_states"]), + "norm": (["hidden_states"], ["hidden_states"]), + } + + def __init__( + self, + vocab_size=32000, + hidden_size=4096, + intermediate_size=14336, + num_hidden_layers=32, + num_attention_heads=32, + num_key_value_heads=8, + head_dim=None, + hidden_act="silu", + max_position_embeddings=4096 * 32, + initializer_range=0.02, + rms_norm_eps=1e-5, + use_cache=True, + pad_token_id=None, + bos_token_id=1, + eos_token_id=2, + tie_word_embeddings=False, + rope_theta=1e6, + sliding_window=None, + attention_dropout=0.0, + num_experts_per_tok=2, + num_local_experts=8, + output_router_logits=False, + router_aux_loss_coef=0.001, + router_jitter_noise=0.0, + **kwargs, + ): + self.vocab_size = vocab_size + self.max_position_embeddings = max_position_embeddings + self.hidden_size = hidden_size + self.intermediate_size = intermediate_size + self.num_hidden_layers = num_hidden_layers + self.num_attention_heads = num_attention_heads + self.sliding_window = sliding_window + + # for backward compatibility + if num_key_value_heads is None: + num_key_value_heads = num_attention_heads + + self.num_key_value_heads = num_key_value_heads + self.hidden_act = hidden_act + self.initializer_range = initializer_range + self.rms_norm_eps = rms_norm_eps + self.use_cache = use_cache + self.rope_theta = rope_theta + self.attention_dropout = attention_dropout + self.head_dim = head_dim + + self.num_experts_per_tok = num_experts_per_tok + self.num_local_experts = num_local_experts + self.output_router_logits = output_router_logits + self.router_aux_loss_coef = router_aux_loss_coef + self.router_jitter_noise = router_jitter_noise + + self.use_qk_norm = kwargs.pop("use_qk_norm", False) + self.rotary_dim = kwargs.pop("rotary_dim", self.head_dim) + self.partial_rotary_factor = kwargs.pop("partial_rotary_factor", 1) + if self.head_dim is not None: + self.partial_rotary_factor = self.rotary_dim / self.head_dim + + super().__init__( + pad_token_id=pad_token_id, + bos_token_id=bos_token_id, + eos_token_id=eos_token_id, + tie_word_embeddings=tie_word_embeddings, + **kwargs, + ) + + +__all__ = ["MiniMaxM2Config"] diff --git a/generation_config.json b/generation_config.json index 30b418a48e04bf5e6d584093aa23393614678619..fb0cb22a96d91853244601c72b288a98324ed355 100644 --- a/generation_config.json +++ b/generation_config.json @@ -1,9 +1,10 @@ -{ - "bos_token_id": 200019, - "do_sample": true, - "eos_token_id": 200020, - "temperature": 1.0, - "top_p": 0.95, - "top_k": 40, - "transformers_version": "4.46.1" -} +{ + "bos_token_id": 200019, + "do_sample": true, + "eos_token_id": 200020, + "temperature": 1.0, + "top_p": 0.95, + "top_k": 40, + "transformers_version": "4.46.1", + "model_creator": "List Cloud" +} \ No newline at end of file diff --git a/model-00000-of-00130.safetensors b/model-00000-of-00130.safetensors index 48cb02ebb6de52ff272e366888581cd494798380..495aaa1a357cff3279e4d7de33e9b0500b450e86 100644 --- a/model-00000-of-00130.safetensors +++ b/model-00000-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:9785f5a87c85710e38f4ca11f819f3d137ff84615af1bc0ba533b94681addf27 -size 3693062744 +oid sha256:d0c16afa264ac999106d7b80b160a97c316a70fabad3d428a9943eb7a35fca4a +size 3693062760 diff --git a/model-00001-of-00130.safetensors b/model-00001-of-00130.safetensors index 03d2e4f89519b916065223d5372b8cdd1b401064..70ce372ccce36fdff0eb11258babdfdfbff18b2b 100644 --- a/model-00001-of-00130.safetensors +++ b/model-00001-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:d2ed94efe077a4498b788706e059d82780deb54436a70a5a9664b716d6cdc83e -size 1208321176 +oid sha256:fe3b7db35ada8ade9963f2242b42d9ab6c82906f302c039cef50358a779cb848 +size 1208321192 diff --git a/model-00002-of-00130.safetensors b/model-00002-of-00130.safetensors index 9c604108dd0eeee1fba743f4a1a13bf7fdf47afa..3046ee94f686fc4e704093669a3a4175bbb3647c 100644 --- a/model-00002-of-00130.safetensors +++ b/model-00002-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:f0c1b97aff37136b5d89a9df22acf7109fa824ccef5f9ff4f763b7869dfc5650 -size 2463868936 +oid sha256:6591f23f0997c5a93ad3b1d07e1640057635b08f633a13a1e676785bac0831c1 +size 2463868952 diff --git a/model-00003-of-00130.safetensors b/model-00003-of-00130.safetensors index 3f2bc7361251b0ce28d48539a0c161b782bf7bc5..d7b12b9359afb6ae015404620231a086bf7dc09b 100644 --- a/model-00003-of-00130.safetensors +++ b/model-00003-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:93be479ff1b6912ff1a7e54f4c4a4e4d67124d1811df8e39d50b981b1b43d8e6 -size 1208321176 +oid sha256:cff032fb55721ec4f9838781cc99ff07ca197a6a8122a79abbca2c72a1bac476 +size 1208321192 diff --git a/model-00004-of-00130.safetensors b/model-00004-of-00130.safetensors index 267f1e40ce2d3060705f737b790211cc5c0ea45c..3388d187b8e70b30e26e54f267c09e5d0f5bdfe3 100644 --- a/model-00004-of-00130.safetensors +++ b/model-00004-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:5d5bead700b8f82dd2a50cee205c37f5642020c414452869693da06df384a9eb -size 2463868936 +oid sha256:47eb412198f9d20cd82a914763df09c7024f15bb364dc8c683c9dfab12242f14 +size 2463868952 diff --git a/model-00005-of-00130.safetensors b/model-00005-of-00130.safetensors index f58637bf72761ad9248ad612d3738320ecf26c88..0163aa4a17c60bea23def992724c7373fa6cbb08 100644 --- a/model-00005-of-00130.safetensors +++ b/model-00005-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:99444d6d83c614776397faa167dc908d48016414e0dd6edef57fd9c040e01d21 -size 1208321176 +oid sha256:29ee6cc2652523a1529efbe193b2916b8312d4c81ffe3bfa69a3d5462890a9cc +size 1208321192 diff --git a/model-00006-of-00130.safetensors b/model-00006-of-00130.safetensors index afc76b5a1a08a830e63138856c4c3f0b83459b29..d0d8c3be52f1a7d5a8d0aef6f3ead8b08dc7ca33 100644 --- a/model-00006-of-00130.safetensors +++ b/model-00006-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:df42d1d91b84ed41f846775a274dbd382185fdf7595009dcd016bd805e25eb1b -size 2463868936 +oid sha256:a73d0f05cd4be0fc95fbd5b0ed43ed89b8b5310f0d77528d5b2f2636b049c15a +size 2463868952 diff --git a/model-00007-of-00130.safetensors b/model-00007-of-00130.safetensors index c3de034d0055d8d7efd95816004e1c5d6afea62c..c45b6cbe311cedfa4f09bda22385271663fa99f4 100644 --- a/model-00007-of-00130.safetensors +++ b/model-00007-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:18882ffcb4f2dddfe6b8766393c68208b524aa4520ed921234a66b11548440eb -size 1208321176 +oid sha256:d844a3f7afec3e0fe03111c45e01c434a4ae20c1d73a3004fcd688bda605ebef +size 1208321192 diff --git a/model-00008-of-00130.safetensors b/model-00008-of-00130.safetensors index c1f0529c61d6b5358aac2e6021e2403b10997cc9..086215d2a5048a92192a27f5b9689b36c2176284 100644 --- a/model-00008-of-00130.safetensors +++ b/model-00008-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:cf8ead5d7b01543a3fafc5a39240b1a3d9fe1cf25b360eb99e7a751359db9705 -size 2463868936 +oid sha256:c76e793b4cfdf48f057594fddc66a767e918f3ba261cc8c27d5206fcbc3790b7 +size 2463868952 diff --git a/model-00009-of-00130.safetensors b/model-00009-of-00130.safetensors index daca91019e09cf08247ede546715096cc662a4f8..4bfd23c9fa56e951b6c60dbafbfeb46ba3da6c29 100644 --- a/model-00009-of-00130.safetensors +++ b/model-00009-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:d897820ce912aa7ae2feb4377d9b8684eca38c18be550b6bcf7316cb9d7c6e30 -size 1208321176 +oid sha256:641beb2755a121a3160b4d7a504b6d15f3d9521d9ad18178515b6833e02507a8 +size 1208321192 diff --git a/model-00010-of-00130.safetensors b/model-00010-of-00130.safetensors index ebdb82d6ac5098a1471cb5362a0fc2726c5c4ad5..ef312a577a842974b9c75c4dfd8fb48dcd2c20d5 100644 --- a/model-00010-of-00130.safetensors +++ b/model-00010-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:734eee6e62863c518a976d41b6c4122ed974cf87e52cd2d7e7df0187a3141b87 -size 2463868936 +oid sha256:acc219978e83281e8c819f646c189d6b1a4d018269194ad564ecf68a2fd2fd6a +size 2463868952 diff --git a/model-00011-of-00130.safetensors b/model-00011-of-00130.safetensors index 202b8fc1c9acb58782f90dff67fda9343739e723..ef4b0fd56f21ca4d44c7dd6b9bb5b18e17b4767c 100644 --- a/model-00011-of-00130.safetensors +++ b/model-00011-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:1237cbe1b9915bfda1efb8ced7d5a4266a0083a3b4c3fa401c4a003e3fea20fd -size 1208321176 +oid sha256:71053f6d6db3f5d5c4ac3231963bf72fa31f431260c82fec8204518c046a8b7e +size 1208321192 diff --git a/model-00012-of-00130.safetensors b/model-00012-of-00130.safetensors index f689858ce1cdfd76de3a0e143bbe46658b125e94..45677a302cb954b35ec6af7f10e14b566adfb9a7 100644 --- a/model-00012-of-00130.safetensors +++ b/model-00012-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:069b272af35289d3c499e98f867b1ffecb1f96980c583bf77b1d4d23c8b7a713 -size 2463868936 +oid sha256:22836d173404306e62d081a63ea3c04fc8ef408cc846bbe2d0a11f8d4fbb5026 +size 2463868952 diff --git a/model-00013-of-00130.safetensors b/model-00013-of-00130.safetensors index 079c54bd0bb87e27f58cd313c1a95961130ea259..ec2ad926591cc55bf71fb4b4a9de656e9cc8d08e 100644 --- a/model-00013-of-00130.safetensors +++ b/model-00013-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:045403b45c8951c3ea3c68b288f04255e0e2fc4de47293f9b941964212b8253e -size 1208321176 +oid sha256:d1b4189b66df90cdc1e63a3ca6428abcf613f42d6ac7d8c2e3fd8a8cdf645124 +size 1208321192 diff --git a/model-00014-of-00130.safetensors b/model-00014-of-00130.safetensors index 07f29eb2d810683a6b12d1d86a5ceb8b19582059..4c24ac349db965dbe973a8f2aa8bb6c9cc61dee3 100644 --- a/model-00014-of-00130.safetensors +++ b/model-00014-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:0277da3d1063a00618b32992617a2448c95c850c1f26dc4024d70ae920a35a25 -size 2463868936 +oid sha256:7598790d1aa068a5c9ba53fcc40c079394799a97306827f1ba1f8cba88684ab9 +size 2463868952 diff --git a/model-00015-of-00130.safetensors b/model-00015-of-00130.safetensors index 68b71ada95537ac9bc00c3adb0e207ef56afc2f9..2aedf41980337097a38b7f0df947f69e4a1c6c5a 100644 --- a/model-00015-of-00130.safetensors +++ b/model-00015-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:d2a9db97dbab9f2a324219d4ba019656b6b635fae3b868d7f2a4fd6e3bab5e66 -size 1208321176 +oid sha256:18068f6619316e15eaa5899bc905d73829c198c95bd73e60ff9a916d06227c8f +size 1208321192 diff --git a/model-00016-of-00130.safetensors b/model-00016-of-00130.safetensors index 0305bea8f22d4759779cbc355dc857246ff7c710..a38b257e4d85e0881a353dd7bd65dba716559f25 100644 --- a/model-00016-of-00130.safetensors +++ b/model-00016-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:90776eaf143864ecb632c059fefd4167e27c5644ba4eb50d65afa5291cff666e -size 2463868936 +oid sha256:51251cb05597e91f3123a4895b103b700f5500292e0645d9dd5098d89905cdc6 +size 2463868952 diff --git a/model-00017-of-00130.safetensors b/model-00017-of-00130.safetensors index 18443a8e3d85852383f2257bf30428636df1ceee..60a08dd6a2ba950aa7b1fb3069b4923c7c4a288d 100644 --- a/model-00017-of-00130.safetensors +++ b/model-00017-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:4ea50b70dae5f8b55b1990a6b6cad9291349b45162548e9d48d63b2a144e3c23 -size 1208321176 +oid sha256:6fbfbaa652a008a347622f73eb65c328519479d39984d20fe7550aa223731776 +size 1208321192 diff --git a/model-00018-of-00130.safetensors b/model-00018-of-00130.safetensors index 04f879f9080a66065768e8e54cba3881044a8ec0..294dd061e09d7917006187ae4baf5e1cdad47ce9 100644 --- a/model-00018-of-00130.safetensors +++ b/model-00018-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:2a239e9eae27174937d5547d8e5e743e84bd7eaea50390510e4cd8f15511447b -size 2463868936 +oid sha256:7aac1f32c20fd51a00f09337203defcce29e9f406bfb1b3ad6f149e1eb6ac5c9 +size 2463868952 diff --git a/model-00019-of-00130.safetensors b/model-00019-of-00130.safetensors index c727ee4e0931ae34f888587f8f853b79d7e7c3cd..cc966920ba232c64f026a9a8d3ac7de6bd3f5b55 100644 --- a/model-00019-of-00130.safetensors +++ b/model-00019-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:5e041358d2ce0d92517b13508046baf08807d46adb33dda5d23728a4cef45f2b -size 1208321176 +oid sha256:71137226bd4232c4b458fa03e452922938c2bbbef11ac6158872f1955a9051d9 +size 1208321192 diff --git a/model-00020-of-00130.safetensors b/model-00020-of-00130.safetensors index 1291075d46679ed39420ef848b7c701e56aed52a..1f845d773fa00dea36af1a8d126b07ac016a1a28 100644 --- a/model-00020-of-00130.safetensors +++ b/model-00020-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:4f4f7af9ded3e7d5775012eae2c7dee63518c799ebbe42a47949aa7f560c5f43 -size 2463869968 +oid sha256:ee55ff6bcd2005fec670a2be80c07b08ce08cf4c5f8e60e475f69fdbc4124ac1 +size 2463869984 diff --git a/model-00021-of-00130.safetensors b/model-00021-of-00130.safetensors index e070b98aa345b7b29c059f4c1cbbb706978495a3..5744d8b4ed06e3cb10e38e1ee23aa8902daee685 100644 --- a/model-00021-of-00130.safetensors +++ b/model-00021-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:8a76ddac05820e58676b3b56e2990c598dae551f1f65adf55a90a3754f66e2b4 -size 1208321688 +oid sha256:f689ebd29f939326b19c48f3ddb20c06f1f8f283dc3f945de7b3ad9a10c07a37 +size 1208321704 diff --git a/model-00022-of-00130.safetensors b/model-00022-of-00130.safetensors index 00be2228b97bf32a593f9518c23f7b6470d3092e..0344f627ecc43c1323fb2a36d8121e631478f517 100644 --- a/model-00022-of-00130.safetensors +++ b/model-00022-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:c080ad8c3b5032434973e205a074e4d1a41edd399a383dc1c6d80ebb073ca09e -size 2463869968 +oid sha256:9d25c1854e0b56c930560a8c3ad8e1e5476f40c88ba8e216304a01c5aca1bc19 +size 2463869984 diff --git a/model-00023-of-00130.safetensors b/model-00023-of-00130.safetensors index 98dd3ee7d42ac5a7cf6c2eb34667df84544d0618..d34b0230861b1b49479e629263b9415414d71090 100644 --- a/model-00023-of-00130.safetensors +++ b/model-00023-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:9eee017222d3eb90afa5126fccb194de12c67828bd4353b3a466ce3da17877d2 -size 1208321688 +oid sha256:283726c528f252b7c37374757865124b80eccea270f296dac9cb39bdb29c30ae +size 1208321704 diff --git a/model-00024-of-00130.safetensors b/model-00024-of-00130.safetensors index b42f6bdb3f6e037942b67645d998daadf547f744..739390db832b92b93c11717c286918711c8cdc59 100644 --- a/model-00024-of-00130.safetensors +++ b/model-00024-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:e3d3c543000e2fd6180bb17c289f36e46256bf0c76f7ae98a7087eb4264db605 -size 2463869968 +oid sha256:0fc0e56e137378c34551c058d11163c6f70ec79980dc503c2e5f8ab8ca969a5d +size 2463869984 diff --git a/model-00025-of-00130.safetensors b/model-00025-of-00130.safetensors index 723dfe7a55f1b61817753bfcb12723c7084d8246..d26283cb5c40c55c8364b7b6bb422ecaedaac631 100644 --- a/model-00025-of-00130.safetensors +++ b/model-00025-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:68580bdb4da65c22fb95a16e7fe13b1f0bbde861327d7c0bb6cb76a86794d38d -size 1208321688 +oid sha256:ce447cd23d3ef6fbb2911e75b2eec4a500be913fab847ddd513b38faaab06ae4 +size 1208321704 diff --git a/model-00026-of-00130.safetensors b/model-00026-of-00130.safetensors index ae245b35dbbd6780e5d860237e0624b59fc50197..be0524e5039642923ee8d02923196d78cf934f89 100644 --- a/model-00026-of-00130.safetensors +++ b/model-00026-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:c0ca69318b53d7ec6f7fcfa7981ed2ec402e73302fd5ea62ed77311f4eb8be73 -size 2463869968 +oid sha256:7ab66aaa211410818416eac84338b5231a55ccc62e93273af57ea54a7da38c57 +size 2463869984 diff --git a/model-00027-of-00130.safetensors b/model-00027-of-00130.safetensors index 2e1a028e0d3c92553f226e6fd6a688934f024c4a..4898a05443931653ad9223a62f5cd5aa71854f58 100644 --- a/model-00027-of-00130.safetensors +++ b/model-00027-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:a6f03ff04b01299dceaf26fe0a0a503d6e0abc58eba94e8796e933e40bd10a5e -size 1208321688 +oid sha256:db40c8e355ef79e34a8f1b1da001714d608016c18ea215dd02848a745d7b190e +size 1208321704 diff --git a/model-00028-of-00130.safetensors b/model-00028-of-00130.safetensors index 26ffc51e1763c45ba7c8bf8d82e8b0835ce4c3a6..becb21ae02cd8686ac35d5c41d3443cf72d7d5b0 100644 --- a/model-00028-of-00130.safetensors +++ b/model-00028-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:6432450282a2cd79475b57bf5b83380addf0b8d36586c750bc4fbf37ce04af6e -size 2463869968 +oid sha256:cfa1a296fb0b36b616a2955e57af670e33bf8cb89171c63e6387b3bd6b381025 +size 2463869984 diff --git a/model-00029-of-00130.safetensors b/model-00029-of-00130.safetensors index 4cbc7b1a120d349f3077da464eab4ae3f40453b9..e166ac02f59a65f5156ff0046fef5f4407634967 100644 --- a/model-00029-of-00130.safetensors +++ b/model-00029-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:961ca8675f7ee7a1a65e5ea5f1e35dfe7427d566e68a1f56f04a463252763683 -size 1208321688 +oid sha256:2b85a8106a86e47f91e2221b043b4eab36c4ef76438d0298ad7c9d841ed8b0fa +size 1208321704 diff --git a/model-00030-of-00130.safetensors b/model-00030-of-00130.safetensors index c5d28f49b37207e307956a4eafec7e27d4c500a9..cf19b76d43467e33c73a1aadd243092398f61100 100644 --- a/model-00030-of-00130.safetensors +++ b/model-00030-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:7687ab86a251404b048268b022b67c148d38605ae04a0ddc46f2328aec60dc53 -size 2463869968 +oid sha256:02cd49378478900445f3295f028990061308abdec79e4d5df4b07a3dcb29a0f1 +size 2463869984 diff --git a/model-00031-of-00130.safetensors b/model-00031-of-00130.safetensors index 936a7d35a4fbf4be1374069c5b2a76615422a780..6556c8ce270ee1e2e193c65c6ae6c9b79fb1f66c 100644 --- a/model-00031-of-00130.safetensors +++ b/model-00031-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:345042a4520442dccd7428238a2d80a5b5b7d990d1d5b61395ffcaad7e4e8794 -size 1208321688 +oid sha256:ec5a215e0fc3048ea77ef02b4a5468ba94c159523d34b348f53396803d42c7ff +size 1208321704 diff --git a/model-00032-of-00130.safetensors b/model-00032-of-00130.safetensors index 64ece8e8ead362474263d993d2cf0bff7fee51cf..069b617f8cacbf390978a943328316dcd87117bc 100644 --- a/model-00032-of-00130.safetensors +++ b/model-00032-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:4faa680a93c47b4624ba40e17b98c725c9704ebbb75644feeb8f8a42a9045a7d -size 2463869968 +oid sha256:619ba8b01d74dd14a7b32d74474e0fda94a4fc1298678dc277716788a253f47d +size 2463869984 diff --git a/model-00033-of-00130.safetensors b/model-00033-of-00130.safetensors index 38573648b07b035a79cabc45978adaafc1804433..8fe624ee13a7f5ccbdfbe37440adaf57cbcdba6a 100644 --- a/model-00033-of-00130.safetensors +++ b/model-00033-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:fdfa10d9c8315dd4dd94d46955e03b012d56e8764db1089e1b2970d5139bb38e -size 1208321688 +oid sha256:00df4ee5d99ca76c1528f0c05beddc36e7de54587a96058a98318c90391bd40d +size 1208321704 diff --git a/model-00034-of-00130.safetensors b/model-00034-of-00130.safetensors index 6ad39db645bced255028db530702509fb4bdedee..00fb18e777ce7e3f01064c7ebf3cfcc6ea5a1de6 100644 --- a/model-00034-of-00130.safetensors +++ b/model-00034-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:ae23de77bccd17a8ec9286fcf71aa2ed2dfe54f3404f6ed755f5067c4d01149a -size 2463869968 +oid sha256:1db20eca10db4d8a09052bb07c3879784b4eefb2cfbc068f9f92ce83f7835e12 +size 2463869984 diff --git a/model-00035-of-00130.safetensors b/model-00035-of-00130.safetensors index ca8f4a80a7f05e0cb11a95373735cb84553c7805..416fe5e3e58127ec87cc9ecdadee7eaeb219514a 100644 --- a/model-00035-of-00130.safetensors +++ b/model-00035-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:6a5ca9a1fd87ba6f98d95f6a88789edf6909270540f0dd8736e05dd9f839943a -size 1208321688 +oid sha256:f470d1acd3e6cccc93991ff168563c5b0150c9e97534ee1c7eb8b410086594a2 +size 1208321704 diff --git a/model-00036-of-00130.safetensors b/model-00036-of-00130.safetensors index 9682d43c045e5f8ec55d9476e0f02221c1e3bbbc..52b472c1fe73c5f7458303e8575ab68d0833c909 100644 --- a/model-00036-of-00130.safetensors +++ b/model-00036-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:88113822767ba632f6a9b1863c6d78c005107ef563d82f7948ed0a3e5b5d76be -size 2463869968 +oid sha256:c05191aca5c7832a2ad70efb76c6053996373a972f944010702c1d89c0615808 +size 2463869984 diff --git a/model-00037-of-00130.safetensors b/model-00037-of-00130.safetensors index 82f91a3c71dcc391d2b90ac5cce09cfcee60c797..63d13ad70acd187b0a302a48a20faf74b2af66a2 100644 --- a/model-00037-of-00130.safetensors +++ b/model-00037-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:3a42e3dfe02d8f2b8b2bfc8d35942e93de8746f74f88390f66d2106d6d7ee328 -size 1208321688 +oid sha256:e5f63e133ddd050c482fe97b9a43c3acb4b71ff9299250061a80ce9aedd54ef7 +size 1208321704 diff --git a/model-00038-of-00130.safetensors b/model-00038-of-00130.safetensors index 9c46d54ecbecea82501d1709d7d73c29e481115f..61ca2b769a61876c015193f1f039a4ba0befb4a2 100644 --- a/model-00038-of-00130.safetensors +++ b/model-00038-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:6cf2b3485504e8b3790424afc1af0eaa735fa835999e5ac3639a0a0a1d1200c9 -size 2463869968 +oid sha256:7b8225555f566cc75813df75f0b06f28c5ff1a17113e863ae2dc5904bb0e0b7d +size 2463869984 diff --git a/model-00039-of-00130.safetensors b/model-00039-of-00130.safetensors index 030775cb3e49fe39e5d29c9d3ad10023ee38177a..e2cdc9d3da3af0e3c22db7ec83cc3ed85405772f 100644 --- a/model-00039-of-00130.safetensors +++ b/model-00039-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:bbf5e9eff7646b206eb25ba1a744d6d2e3544b3713638692a5869f8ef7143680 -size 1208321688 +oid sha256:924d61a64bc0252c8a116af17e04fb0456b9073f69f770bf7641d53459d626a7 +size 1208321704 diff --git a/model-00040-of-00130.safetensors b/model-00040-of-00130.safetensors index 0e25b626dc1bf62f46524401faf3c2c9e4b3502b..3392a258254253e0543f247c4863bed3aec10f6b 100644 --- a/model-00040-of-00130.safetensors +++ b/model-00040-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:499c9039dff0d6fa4c127030bde7cb7557bbd6cf98f7c002093e54bf16a0db22 -size 2463869968 +oid sha256:c702ab514fa24d0793b4cd2eba3e3ce00364031d230ff015b69435bcefd2fe98 +size 2463869984 diff --git a/model-00041-of-00130.safetensors b/model-00041-of-00130.safetensors index bdd58bcd7a340193c18ffba6539601fe09176462..4784f3b84b106237325b5a8089996e661265cc01 100644 --- a/model-00041-of-00130.safetensors +++ b/model-00041-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:3ed0565052bb46b1b3913041d17da44b88c18ab5421ec770c2716762bf23aa8a -size 1208321688 +oid sha256:8187a1702e6f97158ce33d917813bed2c09da5d254c23c3f9252212822122801 +size 1208321704 diff --git a/model-00042-of-00130.safetensors b/model-00042-of-00130.safetensors index 0c7590501f6579bd38f572d48a7bf22fe687c265..1e35a3d875057cb5f52fae526ae64c024e829259 100644 --- a/model-00042-of-00130.safetensors +++ b/model-00042-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:601959ff7bdb6fa3a0b08f529b592d23462083e30c4840b9925f655bde56649a -size 2463869968 +oid sha256:086952771ffb3c230f442bf74089630ce154a7031ff55a096a329eda9fa5da76 +size 2463869984 diff --git a/model-00043-of-00130.safetensors b/model-00043-of-00130.safetensors index ba25206991e2999a800cdc12c505e28693f477d0..8d9d8d73c815b66087d03529d553af356fee8b3c 100644 --- a/model-00043-of-00130.safetensors +++ b/model-00043-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:7fbd3484ee80a51f026b5feead3b59be11d8c4fc02965c58b123bd0111ff18b8 -size 1208321688 +oid sha256:f2007a0ad756d4f2e26a9563c44c0e3bba9eb37d54f39c6c74b7aeae7518b1a1 +size 1208321704 diff --git a/model-00044-of-00130.safetensors b/model-00044-of-00130.safetensors index edc33e3a2b1d80beeefd9870a2795f6bdb24f541..6d9d926de0d9f3c35f1e12bbed441dba053ca169 100644 --- a/model-00044-of-00130.safetensors +++ b/model-00044-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:b349ca4c4779f858f89c6a50f0cd365d147df4b88a523752ea8f8f4221e42f81 -size 2463869968 +oid sha256:bccf19ea9a96545a27081444a93f797b3114001f3837522b622a03730e821916 +size 2463869984 diff --git a/model-00045-of-00130.safetensors b/model-00045-of-00130.safetensors index e83abc6c363784914c7459d9709c964930ccb69d..45aaab663d2c00e7faa01f7d65cebc20dda933dc 100644 --- a/model-00045-of-00130.safetensors +++ b/model-00045-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:54673ecdf05ea6b01934af72c258b05fd6c6018d0cd2d9acec530116d16285db -size 1208321688 +oid sha256:1d303939832d74b199d4593622da9f8edc22acc2d9d0d45c52479c2529a73000 +size 1208321704 diff --git a/model-00046-of-00130.safetensors b/model-00046-of-00130.safetensors index 887f19248240c36e92834c0d6481adbd1e6da5f9..b5791282136437a5f87e37098e1c4f2d8839d3b6 100644 --- a/model-00046-of-00130.safetensors +++ b/model-00046-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:341ac0c20e20e3559be3aadc790c706b983e748a7832621f56659348d031aa49 -size 2463869968 +oid sha256:8fa2f23b6d23a8cd59d7537e70e99dba0bcf4a460159ea2239c8da03cdb4b355 +size 2463869984 diff --git a/model-00047-of-00130.safetensors b/model-00047-of-00130.safetensors index 5bf13b20d94a1ff79489d4cdf9756d9acc948664..92bdf647ded33d191fc717a43bbe308fd0983078 100644 --- a/model-00047-of-00130.safetensors +++ b/model-00047-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:38785114c81c6545b8ddefde004e154bd75a0095de6d1f59cb8e5b36d209d069 -size 1208321688 +oid sha256:4bb44e3a00a144df08f6cb7f486af9aaaebd2d6b1d14d1f0af2bb2c2d6ac257a +size 1208321704 diff --git a/model-00048-of-00130.safetensors b/model-00048-of-00130.safetensors index e79a6922bc89ad6e8c019219f6a51a164127f725..01f645ed7cc656caa93e629310ab4a3973009937 100644 --- a/model-00048-of-00130.safetensors +++ b/model-00048-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:59c01cf8b22f7fd42acd0c8302f3a8c1d657491d0940a33c7aa8ec4c98190dc4 -size 2463869968 +oid sha256:77d90b8ffebccfb85d4a331bf42defc113daf21852998534bbdb0cbb365cdd67 +size 2463869984 diff --git a/model-00049-of-00130.safetensors b/model-00049-of-00130.safetensors index eca69cc2168af32f419ef45490fc92bf54abe3a7..03e1dc194c82d99ae2cf44b8a0afe2136d7d77d1 100644 --- a/model-00049-of-00130.safetensors +++ b/model-00049-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:bbc2141546a281debcfa24080b2851d3f79b9123da5ba552adbf6e9d888b8d14 -size 1208321688 +oid sha256:8b5f293f072a8cc158c6afa7890c7d29c06dc8d69370634e852e7c577318c8ed +size 1208321704 diff --git a/model-00050-of-00130.safetensors b/model-00050-of-00130.safetensors index 02351321b435bdd7e21ffd959c6ff1f67bde5bf4..c3878403a11ced397754fc2c0165bc1b46b65cb2 100644 --- a/model-00050-of-00130.safetensors +++ b/model-00050-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:e6c1dfceca0259ac2d38bff5fdc0e98bebc964c69b2624724e371e7e42c7be09 -size 2463869968 +oid sha256:56465fcf91b6f750b78ad82f64cec306416fdda16a35a4cf1ab98cd8040a2dea +size 2463869984 diff --git a/model-00051-of-00130.safetensors b/model-00051-of-00130.safetensors index cab49fe089c4c4bbf93c35ac0eaccb42ec9c9d8a..309a48e11afac28dce853d32a718119faa265aae 100644 --- a/model-00051-of-00130.safetensors +++ b/model-00051-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:bc4209a8554b3d344e2afe9aefbcc7cd192b480b496d215b9026d0d966f5fb90 -size 1208321688 +oid sha256:013f0a79ef1e565dd47c7956eab6d534141234fac65832d52864849e313cc2bf +size 1208321704 diff --git a/model-00052-of-00130.safetensors b/model-00052-of-00130.safetensors index 93e064a911489f00f071c28cfcab3b4d7ac57549..5182b460d2f8ba0cb1b366fe810ad8721f530559 100644 --- a/model-00052-of-00130.safetensors +++ b/model-00052-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:e1ad313b24dccbdbef60fac452a080233f1b87eaa56d8a875c7c0c5f5272c5b8 -size 2463869968 +oid sha256:3d35192b4238e0b1bb40fdfffa87e98215677caedf3c77b4a3e00a1f5907c16d +size 2463869984 diff --git a/model-00053-of-00130.safetensors b/model-00053-of-00130.safetensors index 8acc452ac24c10d7868c4ec4812733ac3aec1530..6c94f8034466b02a2572004c662327525d225785 100644 --- a/model-00053-of-00130.safetensors +++ b/model-00053-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:84f5bb1d8a740b89b24b59fd6d607d198099e480cf67e52dc2c8b49deb9b3fdf -size 1208321688 +oid sha256:bd9e2290acd77a17c415124af372799398cec5335c67034eea48ffdcca64bbc3 +size 1208321704 diff --git a/model-00054-of-00130.safetensors b/model-00054-of-00130.safetensors index 4b8c8c3b45fdae686d4bdfedda940b9be3cac702..c2f03880de6d3caa3db954438b6e83972b29d798 100644 --- a/model-00054-of-00130.safetensors +++ b/model-00054-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:a001ec5d2dd12f6a87c558766b0fc24aee042775a6806d37da459cf3e838e579 -size 2463869968 +oid sha256:f5f7720f95bf51c58cd2954a0eb41755bc165dfd723fe6f8eb688f6b14e910e7 +size 2463869984 diff --git a/model-00055-of-00130.safetensors b/model-00055-of-00130.safetensors index 5acdf2d90da13e63234300e94b04ccd314ca694c..71a171a2ff32758ee410aaaf0d08749fc776e5e9 100644 --- a/model-00055-of-00130.safetensors +++ b/model-00055-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:da2a90dda71ac298bcda0d6ef83dc28a129fe66ecefe27a064d3637c4f3f723d -size 1208321688 +oid sha256:a575ca32b8b05436ec890f6e99111aba2dc8d4dcd2f4ba51e9933c93d7625bef +size 1208321704 diff --git a/model-00056-of-00130.safetensors b/model-00056-of-00130.safetensors index 17add31e2059eeb5342f1bf64cd64401a4fd1960..84d87c9fc00d92e912923db7a0b1dc802c617ed7 100644 --- a/model-00056-of-00130.safetensors +++ b/model-00056-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:9fe32d8911b7fb9857170ee26b9f330b1674e2c1f78cb0ef749cce9d6ec06c0a -size 2463869968 +oid sha256:1bcbe082d00e1a7f9a2a3601f885cb03de48c146c16720f7a24da27000c52bcc +size 2463869984 diff --git a/model-00057-of-00130.safetensors b/model-00057-of-00130.safetensors index 7cead9dbaad4aab8e67007f5bfd690f79df6267c..0fcbbf57b2a94845ab5943fe7436407d6ffcfa10 100644 --- a/model-00057-of-00130.safetensors +++ b/model-00057-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:1e8d73847187dc7d4da9a41ed3f5e7fd8f324d14eb107845188138b464299eb8 -size 1208321688 +oid sha256:1b71271c3c85735c62278080e022c3a7609b70b8649792d0962c03a2375bdddb +size 1208321704 diff --git a/model-00058-of-00130.safetensors b/model-00058-of-00130.safetensors index 93204ba0ad0131f364b3e693c91ed29e6aa42483..75a90877d634e0bd8bcd513e7c24c15d34970cc0 100644 --- a/model-00058-of-00130.safetensors +++ b/model-00058-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:61ae96c272433d211c12be3ec81471dd21868f6b79e326023a5f687cb0edc77f -size 2463869968 +oid sha256:c8dc50834d0c87cbafe7576ed3c6d6f5b24ba93f76afcd7f3d4663fa30e9bdb6 +size 2463869984 diff --git a/model-00059-of-00130.safetensors b/model-00059-of-00130.safetensors index 397d8ba29ae12eea475e39cbbdf653a9c4d3491f..1c076d9e483729d9b286efdbeeb47f2dce7590fd 100644 --- a/model-00059-of-00130.safetensors +++ b/model-00059-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:2f19cb6bc24a9937faffa46939c209f5ef790825e964cb6a2b86ab56719bfe2b -size 1208321688 +oid sha256:740568378867dfc6c6ad03b2b9f3fb94278ad17db0402d9517638e58d2119ef2 +size 1208321704 diff --git a/model-00060-of-00130.safetensors b/model-00060-of-00130.safetensors index 09efbd5b9dba8a7b42d4096cc673bd12fa200160..0d4b30ece26d710ec6a028ae4c7dc26c0f897e92 100644 --- a/model-00060-of-00130.safetensors +++ b/model-00060-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:3b43a6164a7654e0820410d279cee374ac3b64266dd95fca228417156ff93f2f -size 2463869968 +oid sha256:b6d194c823c68c8f1c35df8aff3e5cf1d0a794d4ff83bbbe4402f88e674466df +size 2463869984 diff --git a/model-00061-of-00130.safetensors b/model-00061-of-00130.safetensors index 3b550897b04744fd99f6f2bceee2e83eb8a1617e..b3d8122ace8ef277dec4ec32ec238a725ee49994 100644 --- a/model-00061-of-00130.safetensors +++ b/model-00061-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:0ed80e6f71a57a8d74ffcb046d39c836441efb2d2bbe542299550b929a2d6ceb -size 1208321688 +oid sha256:37073ca7f0d5286e7f0f2b444d9da166a41daa50fbfedff413d90b6ab194ee90 +size 1208321704 diff --git a/model-00062-of-00130.safetensors b/model-00062-of-00130.safetensors index b536def12714131ae651f0405f2dffafa28af95f..4a63db751b00809bcd48e81a1ff3cb8b334b54c3 100644 --- a/model-00062-of-00130.safetensors +++ b/model-00062-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:b89ff1d3c45edc5d06652d2dbde36657f0c327e57e04558b0b0e46793857f4a4 -size 2463869968 +oid sha256:471276d00ebd1bf22bb32a4f02859f75e0d329fcf968858d086a1c71431b5ec0 +size 2463869984 diff --git a/model-00063-of-00130.safetensors b/model-00063-of-00130.safetensors index 1774eeea772f5590288a21b2c6fd1b1fd178f528..0342e6ab8692a0308ecfb47c257e121d55e0d768 100644 --- a/model-00063-of-00130.safetensors +++ b/model-00063-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:8585c7cd94187eebfe4b64a25f13125add8dbc9932fee3a2af96cbc3e0cdbf9f -size 1208321688 +oid sha256:ec4977ca868f31d64ffdae7b463ffd5456d1c391c1677d77b49e3a2684f53d3f +size 1208321704 diff --git a/model-00064-of-00130.safetensors b/model-00064-of-00130.safetensors index e644e98b21b177dbf954f88b60f07acd3089bad4..df6373fc64d48152c372129fdafab1697b0adc52 100644 --- a/model-00064-of-00130.safetensors +++ b/model-00064-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:593a3e7a56cf130c7382de6a03d702be6ef279d887e7236d9b4fbd2bbd3d24ba -size 2463869968 +oid sha256:307af83d7fc8becde1225b4b940cf0c078264241e9f2160bcd936ee7ee3eb513 +size 2463869984 diff --git a/model-00065-of-00130.safetensors b/model-00065-of-00130.safetensors index f1873e036e3b49c3e4bacd7e7d3665e022f437bd..30ee0aaee0253da5c71d4c0c60d341d9ba445fdb 100644 --- a/model-00065-of-00130.safetensors +++ b/model-00065-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:75849a0106d8bc2f1b20aef71eeb58cb3077c7e2951cf3e09788234def0c9927 -size 1208321688 +oid sha256:dab848ac603729e199e581d602cff9b746fb34afa0d3246749231591428aca7d +size 1208321704 diff --git a/model-00066-of-00130.safetensors b/model-00066-of-00130.safetensors index 565350f8285833e446755e316203834341e17745..bc702fc4359e61c2e186493511194f375e505112 100644 --- a/model-00066-of-00130.safetensors +++ b/model-00066-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:00408d15935315da1a7bcbc23eee9aa4ee4563a4c14618b101dd33658960edf0 -size 2463869968 +oid sha256:d88e6be0b5ce61cc100aeb8488fcc52639f20e7168d8efdf510a3dca020de2fb +size 2463869984 diff --git a/model-00067-of-00130.safetensors b/model-00067-of-00130.safetensors index 4c63f2b39f0bb475fb92ed3debeeac6ed8c131b2..4da08c81e8b4d8ddd2691073c184d76a8af6a797 100644 --- a/model-00067-of-00130.safetensors +++ b/model-00067-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:4bface08c504ab1bf82e693c360accc76e49e579908e9b59dbd730ba9b8d756a -size 1208321688 +oid sha256:d270184c361d815d80fc48ab6c6f83ee46768b3c4f1d4b27b0527c437e881bca +size 1208321704 diff --git a/model-00068-of-00130.safetensors b/model-00068-of-00130.safetensors index ab6b080bbdff2294fd7bffe8ec18934e11e83c39..afa43c470c5d73240eab9bb9b12d8ca51f6bff8d 100644 --- a/model-00068-of-00130.safetensors +++ b/model-00068-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:4a168b285a43f7ca03835b8c2ac472a5dfea4b01589a450040298a35d24092f8 -size 2463869968 +oid sha256:575b78cfd2bc412c7819764f13ac5bfe417eb34ca6663a5ae85254c716aec326 +size 2463869984 diff --git a/model-00069-of-00130.safetensors b/model-00069-of-00130.safetensors index 1680e257bc0cd7ace6516d79b4ee2d9f5277db19..002371160aa73ea0a39f4a74504ad52f2d4bfa51 100644 --- a/model-00069-of-00130.safetensors +++ b/model-00069-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:246239f37d0a7ac21cb105235861fbe48945361dbd5091d5cf1cffa5d5d24e14 -size 1208321688 +oid sha256:db4c3167a4096f7936b97f8f91694fa3350d7a003924957dff95c8184f7eddde +size 1208321704 diff --git a/model-00070-of-00130.safetensors b/model-00070-of-00130.safetensors index a72465815a066c26f413628d7fd70d132c6a93b6..65acacb5dd7fe7fe8b5268a46c5165a166e143a9 100644 --- a/model-00070-of-00130.safetensors +++ b/model-00070-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:080f36a819d8014d93b3ff55ce5ca9e898322c721439f149505f7837ec8324be -size 2463869968 +oid sha256:6b751cd22520a3901bcebff6cf1ac1c9361b69211b3f65e48e8a7f5ecbacae14 +size 2463869984 diff --git a/model-00071-of-00130.safetensors b/model-00071-of-00130.safetensors index 7a877f95d536c4766375d1d811541473a6b6acbf..66fadf1581999b1c524d3c39898d8e675c004cca 100644 --- a/model-00071-of-00130.safetensors +++ b/model-00071-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:9e1a3b6d59ca4dcf99af877931f96cee754eb5019648f10b0fe01803c57a53b2 -size 1208321688 +oid sha256:8a13ff186cc4005f9347ba10f367f8b095cc6925e23c7d7cd8c287c3c8494cae +size 1208321704 diff --git a/model-00072-of-00130.safetensors b/model-00072-of-00130.safetensors index 2db83ed763e8602f2be2387aa53b1e7f036e82b8..86c19e001a5a9535128dd0cd5f2c51e909a331c8 100644 --- a/model-00072-of-00130.safetensors +++ b/model-00072-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:3702d9c9f31f088bc10d0b86c458fcf37245d066b6db9cc4d8e3b256e7c4be5e -size 2463869968 +oid sha256:0217b335e2aeb9c5f3ce97a90786cb8fc4a719bee224d57760c5ee322f566b2c +size 2463869984 diff --git a/model-00073-of-00130.safetensors b/model-00073-of-00130.safetensors index b99689793d35a35e1bfcf3c7a86c2690e07c70d0..c2c7010e8a8082686de55aa94bf1bfa5fa9d59d7 100644 --- a/model-00073-of-00130.safetensors +++ b/model-00073-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:c71864e0febd666681bd413d2deaa82103227eaf4a77a42c00ca5b9f363c969d -size 1208321688 +oid sha256:be3e34df5603e5c54543f8f3f1c0577439ea5d1da56d92aac284a79dfb1d5a10 +size 1208321704 diff --git a/model-00074-of-00130.safetensors b/model-00074-of-00130.safetensors index f2d8e6d03fa3aefbbd35bc500c71db136bf1fe1e..7469b51f385aefe893dae9ab9423cbeafc306d7e 100644 --- a/model-00074-of-00130.safetensors +++ b/model-00074-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:07e6d2b9d5cf7e361328896bd44f001c924cea3a3d139d31455a095d31f71e49 -size 2463869968 +oid sha256:26b5ca15031d1ad287d6b2eea514b758f33c5967e011fa3ee91c42878f5d28a5 +size 2463869984 diff --git a/model-00075-of-00130.safetensors b/model-00075-of-00130.safetensors index 2ccca589034f1c4070e1b2608f3bcbca2f59b68d..7d7a2cb82d761e1f94709b6bbfaf9d5e7fd599de 100644 --- a/model-00075-of-00130.safetensors +++ b/model-00075-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:674d2be3b866d45ea6d84c68fe2d7256167597fe19f016c5a5d89351c579d382 -size 1208321688 +oid sha256:3456597f9c157dca04a36e392ce7d6d90055a33f584aae355c3adc176f172fd8 +size 1208321704 diff --git a/model-00076-of-00130.safetensors b/model-00076-of-00130.safetensors index 8d9e038f72eb31cee492eed3493e01b526d886a8..615f96ba3dc6623f33b87e98586c8b61c8043f87 100644 --- a/model-00076-of-00130.safetensors +++ b/model-00076-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:160a131a07cbbe229190595ee4ac88a04c663a72ecdcdf316eb4d46e3654fcf2 -size 2463869968 +oid sha256:c60bb67369fabd9c63b32e8db14aaf23c017b6bbcaa004a950fbbc825fb91ec2 +size 2463869984 diff --git a/model-00077-of-00130.safetensors b/model-00077-of-00130.safetensors index 263abd798b6dba35eb2bf613aa4a2fe93f3df560..cff06d9712f7682e4348817e2cedf8b350947507 100644 --- a/model-00077-of-00130.safetensors +++ b/model-00077-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:2a2a1eee70e8b1fc35d179fb05f83cb1d5f11765cf9b854425f2f973c379c26a -size 1208321688 +oid sha256:26619e4beb5d05b3fe8c15f608fd4caa7ab7f2f6fc1ea53d9e8f0cc76f06db79 +size 1208321704 diff --git a/model-00078-of-00130.safetensors b/model-00078-of-00130.safetensors index 2af9b063ef98b2dac796addd080a108129ffbbb1..16b21d920589fe6f143e4f311b7fd67a289ddfdf 100644 --- a/model-00078-of-00130.safetensors +++ b/model-00078-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:799eaaf53b6fa6e4a367e56333f8496df3791e009471791ce21ab655b5f7e132 -size 2463869968 +oid sha256:f74cb7e92d96eb05f0bd712b2ad3417e62e62c1850171391ccd16ba89a194954 +size 2463869984 diff --git a/model-00079-of-00130.safetensors b/model-00079-of-00130.safetensors index 4a7317528c9beba59cbb53ec1e9aa2048e1fd549..73c469282479454735011391a39d56230d34fd5c 100644 --- a/model-00079-of-00130.safetensors +++ b/model-00079-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:5bf243b4004996bdbf7119bb4f43b5d8159b2f70412715058cd964e88c1607e9 -size 1208321688 +oid sha256:b5467aee8bbdb0ec4ceb53c0bdfd5ae4f3cb4c1f11706c2b967eaae0ad55abae +size 1208321704 diff --git a/model-00080-of-00130.safetensors b/model-00080-of-00130.safetensors index 094b751a5e04ebc5d812f75f22d0bfd2f263afa5..eb3f860f1a71e5a917dd08730608f4d32a6e6b48 100644 --- a/model-00080-of-00130.safetensors +++ b/model-00080-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:9f97809043caa0d67ebf635c6f585cebba6264a50e5c160e5b600d4f23aacbf4 -size 2463869968 +oid sha256:9286309d1a4fcfbd073aa9f984f8268f6980c576e2b7d8e89eb56a01d1dbae85 +size 2463869984 diff --git a/model-00081-of-00130.safetensors b/model-00081-of-00130.safetensors index 756367ce993afee9f2ca25f356796a52aed6d76f..9a93f77300a3f66af98be248c18ba61c53421f46 100644 --- a/model-00081-of-00130.safetensors +++ b/model-00081-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:8129bb648b2bd7d503df489b6260b0c902f892735bbb4d656f59e3d3a93e45b2 -size 1208321688 +oid sha256:20082af5e0887d614e89610fc53bfbe904be28091a2b81888b64c760e8581a7f +size 1208321704 diff --git a/model-00082-of-00130.safetensors b/model-00082-of-00130.safetensors index f08a9efcdc5436e36a7aa3fd293aadc870bf0846..f126e06e2134edf392ba3bdaa2033e9f9bc89617 100644 --- a/model-00082-of-00130.safetensors +++ b/model-00082-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:4a581d6de6af239880bbbb4cd875954edf0c95ad14b43fdd1094871386704dd5 -size 2463869968 +oid sha256:88fc35e7132aae27fde38421a2f845536b7e2561e826a64cfb1fa50724b8f648 +size 2463869984 diff --git a/model-00083-of-00130.safetensors b/model-00083-of-00130.safetensors index 08a2e41a97eca2d3b8cd9ec28c1df4ca7076ba4b..169b415ec31fa238af249f11cf5fb96414ad5f0c 100644 --- a/model-00083-of-00130.safetensors +++ b/model-00083-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:246f02a0e29120dcef28ea85a0eacd8d5a5722d0f0b165f61fef821f700f9d9a -size 1208321688 +oid sha256:7e7b4d1f2e311d55f966342f75addcc725de48c0f5502902d01883bc870c7988 +size 1208321704 diff --git a/model-00084-of-00130.safetensors b/model-00084-of-00130.safetensors index 151b1efeaa43ec88db479a561319d3be297b5df2..6f39dfa1667f94a66ccb05928271d07dc1214ba3 100644 --- a/model-00084-of-00130.safetensors +++ b/model-00084-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:4083ad0a522bc60a977253d091f496865f75f0be4d6ece2b975113a30007127a -size 2463869968 +oid sha256:21f50eaeadaffa7c8ba11803f913a33a9326735f005048a83dfcb5bae8664991 +size 2463869984 diff --git a/model-00085-of-00130.safetensors b/model-00085-of-00130.safetensors index 5a36e63bf7b49ed65ebf03c93cdec2af0e747bea..cc6ee902982843ee2bcd137ff9b4f052636e2d1f 100644 --- a/model-00085-of-00130.safetensors +++ b/model-00085-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:35ea447ad683c811138d91696d8fda8008293a785518b7b86b1aa6c9ddc209b9 -size 1208321688 +oid sha256:6bc79d060033c27e1eef0ead16980e2ca552dd7ae32c3c4aeb2da11599aee4c4 +size 1208321704 diff --git a/model-00086-of-00130.safetensors b/model-00086-of-00130.safetensors index 1485d5420b3d2ad51357b78dd6810569f86b125d..b6d0db16e3dd3ed6d32e0c2f0d9382a08fb7909c 100644 --- a/model-00086-of-00130.safetensors +++ b/model-00086-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:257545b54e89ceed10803953ccc19db9f723916eae82f62293b244af9ff18773 -size 2463869968 +oid sha256:f906955440093248ccfad2994d3d4609d8925c27eae4aea2c9ab8fda6b21a2c0 +size 2463869984 diff --git a/model-00087-of-00130.safetensors b/model-00087-of-00130.safetensors index 5f64b2e957c60995a5c342ad0ae5f674439290d4..ef3954a16c86ad14ea3c773c212d7da88bcd1889 100644 --- a/model-00087-of-00130.safetensors +++ b/model-00087-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:0f9a088db1323c4b7f2278201665b8d829cce886267b069659b88fbe3b38b0db -size 1208321688 +oid sha256:4ba86c862516a7978c441305b48db43baebc5bea6e3af7d7779b617b0bc05088 +size 1208321704 diff --git a/model-00088-of-00130.safetensors b/model-00088-of-00130.safetensors index 07952818ab6fbddbbab4516fbbd27a53e70b7834..78c3c4ecc482ed8116427955d9e0bfc3fde38757 100644 --- a/model-00088-of-00130.safetensors +++ b/model-00088-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:55ec4e69a22dd99aaaf394a95d830a7deca496acba7870509d6e70b084bce6e8 -size 2463869968 +oid sha256:ca347c28b286dda5a691745bfba88f995441983e8c9791903baae7e467a8405d +size 2463869984 diff --git a/model-00089-of-00130.safetensors b/model-00089-of-00130.safetensors index 07d8a3a8e133029a086b3210ccfdf6de8091aaaf..806120dcd0e3b1a29c77d9f39981a7fa0170e78b 100644 --- a/model-00089-of-00130.safetensors +++ b/model-00089-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:e86e9b192f490993592b1c331726b32d3f9bdf80f2d6abe893d20cb70e51760a -size 1208321688 +oid sha256:a0735a130b3ad68cc48c297a29b86dddbe828d9eb94c7530b3387b8c783444d7 +size 1208321704 diff --git a/model-00090-of-00130.safetensors b/model-00090-of-00130.safetensors index a997f1abfe2c5600671cebb6bd0a79fee729c902..fd2e3dc9fa6de5326488d53ed08a2b44672a25ae 100644 --- a/model-00090-of-00130.safetensors +++ b/model-00090-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:b57125eec75a1b0cb31d3a8401d6a231359419e549e20072bcc39709423b129f -size 2463869968 +oid sha256:925e556dbdf2afad4acb90fb199c609dab09cbc318ac058757592750cafbfaf8 +size 2463869984 diff --git a/model-00091-of-00130.safetensors b/model-00091-of-00130.safetensors index 72978b3ed4923e95a2275ff4e86345c0a2893519..5ebeb21b86f4ebd9b44f01342624910dd40adba8 100644 --- a/model-00091-of-00130.safetensors +++ b/model-00091-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:0c613cdacd627e2fc3de08194efe1607aa06bdd386e1ccac1c7c133f4b5a2e8f -size 1208321688 +oid sha256:bd424c50b1d72779a1726bb60232bb6cae97c26d878706c829e1484d65c85c7c +size 1208321704 diff --git a/model-00092-of-00130.safetensors b/model-00092-of-00130.safetensors index 70865c5f5d0e494c9374212f0e4b3c928e0e4813..31fd28f3b8d265cb26c93c73587c5653ed6e8a95 100644 --- a/model-00092-of-00130.safetensors +++ b/model-00092-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:a406fcc45a8a785e366d68ef9b222940d480c788a176ff26c74d7287051554e2 -size 2463869968 +oid sha256:56c2b84dc66d9b6e2b1ec46c579fd4bb6697606587926f75770fd50ab11b9f94 +size 2463869984 diff --git a/model-00093-of-00130.safetensors b/model-00093-of-00130.safetensors index 67d49217407988dde62de78ac81510ab902d9bc3..44b420240a2528b8c17e7b7ede5b17126b5c2983 100644 --- a/model-00093-of-00130.safetensors +++ b/model-00093-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:27f4f5084a432f77340599da368f6fbd7be38f07380a8ea87b39807a67198365 -size 1208321688 +oid sha256:e1c0568aa013b2712520a408290ccf1a54bef1bd4f4af8ee02d6029fb974efc2 +size 1208321704 diff --git a/model-00094-of-00130.safetensors b/model-00094-of-00130.safetensors index ba2e7f032b89d7119e9480929ce09f8cb4fa39bf..9126679073a6c2b4380fa86465f3037222b9d123 100644 --- a/model-00094-of-00130.safetensors +++ b/model-00094-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:d6db2523f161c686a3ae2dbd7b09aac6a6f0b0d5304805876385ab7c4bc0b5c7 -size 2463869968 +oid sha256:5e57a6198c8ae05d3f6d2d701085f6c3c7053195fca9f0be3d4395e45f75e4b2 +size 2463869984 diff --git a/model-00095-of-00130.safetensors b/model-00095-of-00130.safetensors index b7ae61c3ae8c22617653b214b6fab00b18bb778a..3194ff05c47aba4daa261d38f0733ec2e0473607 100644 --- a/model-00095-of-00130.safetensors +++ b/model-00095-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:a8480d9cc9216c650a30cd7168244b84aa6762c7835a92600ce198da2d15fbb1 -size 1208321688 +oid sha256:c35c3e96eaab7420f2c1f78f7784c6b077b6c4f158f2279f6f62e28b26c396eb +size 1208321704 diff --git a/model-00096-of-00130.safetensors b/model-00096-of-00130.safetensors index 0cf06537026d731d449cefa5155ad307d1b57647..3401e39e35415c79467b3b6d49fd95a7ab716907 100644 --- a/model-00096-of-00130.safetensors +++ b/model-00096-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:0123cfef652f44b2c6dbfcc47ede03762d4a572236367eee32a677d43d9a4dca -size 2463869968 +oid sha256:d5986ad365b5c92b39c53cae7d8091250f86d800eb9f0b85f5c92e46b0023299 +size 2463869984 diff --git a/model-00097-of-00130.safetensors b/model-00097-of-00130.safetensors index 22b3dc84b20f4deee20c8e326d5be9437b9b6484..4f6ec33a96ffd39764ce97c9765c077a3ec29ec8 100644 --- a/model-00097-of-00130.safetensors +++ b/model-00097-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:181466337b86afbc94dfae30196ca15a27ff01b35c5cf3939682032c5c0469c3 -size 1208321688 +oid sha256:b21f35ffe104867870a986950155892fcac9affb7b0bc42680807c375f84dcb8 +size 1208321704 diff --git a/model-00098-of-00130.safetensors b/model-00098-of-00130.safetensors index cf3a85dfa6b0b245395f80785f4d626becd1cfd6..f0198b1725bd71a8625a591612b4e40e2acfb88b 100644 --- a/model-00098-of-00130.safetensors +++ b/model-00098-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:cb371f55564ec7a0ceb55bdf314c56b61385acfd7d59422e6b3a7efc75dd125a -size 2463869968 +oid sha256:8bb54c5eaf81beba692c4f618331d9c710c64fc6e0d3aa76f7495b37d555890e +size 2463869984 diff --git a/model-00099-of-00130.safetensors b/model-00099-of-00130.safetensors index 7613dc406dcd2cc80f63f103794ec120eff2f898..950d991a362e8f6a5ae56fc8f45a533f7527216a 100644 --- a/model-00099-of-00130.safetensors +++ b/model-00099-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:9f0f0bd9e07f7097693bfb58da9c73e35bf1e39eff80f0fba8f46ecde511cf63 -size 1208321688 +oid sha256:88c8ed89e176df2c57a2b541dd546f4860195ec55d89d5d242559a5e05b3923a +size 1208321704 diff --git a/model-00100-of-00130.safetensors b/model-00100-of-00130.safetensors index b4da88bffed506d100c3a6234632846751765676..80f24a84be2fdd0fc0231093d002b8d1d690d581 100644 --- a/model-00100-of-00130.safetensors +++ b/model-00100-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:45fd433c26aab73e4a6b4d4566f5511c4376549df1ed9c4257493b1c72710fa9 -size 2463869968 +oid sha256:c462e7176c100ee61e19b9d983fd2fcd623765627082c889793e9d4f549f1ebd +size 2463869984 diff --git a/model-00101-of-00130.safetensors b/model-00101-of-00130.safetensors index 3923c3bed5990ee26deacd1824e399fbdcc42c4a..5462ae21fbc9c180aa464d03e065802c7b950433 100644 --- a/model-00101-of-00130.safetensors +++ b/model-00101-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:82ed509a2950aacc0a217e61fd8ca43bf06cbd5c6fa734c33bb7e6baec4a85cb -size 1208321688 +oid sha256:072e923c77c6d78e6c7e8e88f14a2942ce5b9923a1b4defcd2ad0eafeeed18fe +size 1208321704 diff --git a/model-00102-of-00130.safetensors b/model-00102-of-00130.safetensors index 07cad3d80a73117eee3fa7b81c7719ee58fa4e53..d4481dfcfb93fadbfa2463e85a33e337ca59b94d 100644 --- a/model-00102-of-00130.safetensors +++ b/model-00102-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:af2c3743f4034f012b1855bca20bdfe2b081dd864a2bdc7064e9c1ea9a09f94c -size 2463869968 +oid sha256:5aab62a54bb84e1471070b07068a5b6e0a98827e2d42486aa5d11904a49adff5 +size 2463869984 diff --git a/model-00103-of-00130.safetensors b/model-00103-of-00130.safetensors index df5245f3a3f003a6d492a419c9a1e4a6ac62bac1..eb98a041a0dfac7ded2cee2f79471b3862ad48f8 100644 --- a/model-00103-of-00130.safetensors +++ b/model-00103-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:4ac162b3348bc1ed712146b4d2a3bf443250c2268bceaca15c8cdce38a7fca7c -size 1208321688 +oid sha256:de918f153ee3f6930a6377b9be4570e17cf1b5e15e9649fe153271de2a77f2fa +size 1208321704 diff --git a/model-00104-of-00130.safetensors b/model-00104-of-00130.safetensors index ab77252be678a414d0ba73857b9e0ca5e3f8ad89..47685b3d063cfedff7a3a819993efad4f6590dca 100644 --- a/model-00104-of-00130.safetensors +++ b/model-00104-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:2f21dabd4f4214b13c4803104783e5a3ad5af9838bcc849d1606c0e1f096a946 -size 2463869968 +oid sha256:4d07ec351c1cc965dd4b1f0809f35cd3e75c6a12a2aba2302f1e186e038c6e42 +size 2463869984 diff --git a/model-00105-of-00130.safetensors b/model-00105-of-00130.safetensors index 45210b1ff615fcdf4c5f7e85253d4c2605f645a3..023ab000e09c4422ce493746f6eb6dbdded7a53c 100644 --- a/model-00105-of-00130.safetensors +++ b/model-00105-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:97dd9fc182eb0583291bd29226ef3cf41319fab78295a910470fae7ea49339ae -size 1208321688 +oid sha256:d2eed881777af73df8d435f7ee40853ca0e96a5c49fe522ec8f1697043943421 +size 1208321704 diff --git a/model-00106-of-00130.safetensors b/model-00106-of-00130.safetensors index 86d8c835bbc5b26c7df2a4d497f4a1ad11c69fe5..2d093f8e31bef671a2ee8fae6f735be87517de7e 100644 --- a/model-00106-of-00130.safetensors +++ b/model-00106-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:55f519426d248d7c57a147a1b82d819900788e43a62b6972c2148586f10f05f0 -size 2463869968 +oid sha256:205cb5c7e241b11d58c2994212570ded889f94ac2d0589799650afc1ebd66197 +size 2463869984 diff --git a/model-00107-of-00130.safetensors b/model-00107-of-00130.safetensors index a8acf560e26d6401d28a80b706344a56bc715d6b..71598507ba7253c0418afcf5f27a1696807f75f4 100644 --- a/model-00107-of-00130.safetensors +++ b/model-00107-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:b50498a9bf402bdb82bebf103685634c37334609f9efcecf54babe7f9b5baf65 -size 1208321688 +oid sha256:f81ccc9301fb0b0019f3bc8fc7c11b2ca947e6b184ed6f10394002159089b59b +size 1208321704 diff --git a/model-00108-of-00130.safetensors b/model-00108-of-00130.safetensors index cf261e862f2fffe47dbf9891f769a265a593ab9b..309fff956c02994e3f4cc6ed66465137b2d750a9 100644 --- a/model-00108-of-00130.safetensors +++ b/model-00108-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:502bdd08025b8d357717bdd305200df326f5f8c0e7ec6f7ce2c82d115cdf7e75 -size 2463869968 +oid sha256:d0ad55be87bca5e9b773db48f76bfd66ede0a53057d1d787d7323142a9690f35 +size 2463869984 diff --git a/model-00109-of-00130.safetensors b/model-00109-of-00130.safetensors index dcd65a71881b9186d4e3c5b0f80482ebc36c5793..dd750502ddecb6b81084f2ab8b993e206203abf7 100644 --- a/model-00109-of-00130.safetensors +++ b/model-00109-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:5a528b71b52c2211e6f91deb829d11cb22b655dd57dc251d84ba4fe521e47ba2 -size 1208321688 +oid sha256:dbce9cf061e37baa89ae57c1ade0ea4d605b4d19cf7a5d048a176248196102ee +size 1208321704 diff --git a/model-00110-of-00130.safetensors b/model-00110-of-00130.safetensors index 4ab388f0f8dcd7c3e3c3752565806610513ac6dd..15c7fcdb46fc3da93292bbb1c427e402435b3872 100644 --- a/model-00110-of-00130.safetensors +++ b/model-00110-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:20d1fa5b16599eee4fa39118f73508b579190a374f70f6c1bf83018c60a9d7be -size 2463869968 +oid sha256:412b58ef7c3ac38758ad75e14a9c4976ab032556a2a0b4924494d0eed2116653 +size 2463869984 diff --git a/model-00111-of-00130.safetensors b/model-00111-of-00130.safetensors index 8b8d0aed910716e757e208d0538aa6c499c8f579..b179d68003cd4c002530a1a26b185051565e4a0a 100644 --- a/model-00111-of-00130.safetensors +++ b/model-00111-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:f63b6c84659c71d9d253bf5c22237c562d3a3fb44c70fd54cca9d7993c35ea04 -size 1208321688 +oid sha256:9b7a752507c7ec34b57abf1db86fb039291a73f5d3ec137b7cbd84793089fb85 +size 1208321704 diff --git a/model-00112-of-00130.safetensors b/model-00112-of-00130.safetensors index 346df719ba428ce5a981cd8e5aaae2f28eb616c7..3a28a2ec90fde21795b161093e8958eefc45bd3c 100644 --- a/model-00112-of-00130.safetensors +++ b/model-00112-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:c4e0b5428019c75f894907107d85da010697f4ecc333b244c6cfb4aea0e3c440 -size 2463869968 +oid sha256:bf588bc965737e3ad9f27812955675d24df46f5ebd899840f481886884a3bfac +size 2463869984 diff --git a/model-00113-of-00130.safetensors b/model-00113-of-00130.safetensors index feb96566abdb29f11cba1e5df257366f670536c8..49190a539189de643662dfcdb8d2bf7ad22e0827 100644 --- a/model-00113-of-00130.safetensors +++ b/model-00113-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:e48bfe3f2a384aebf1038c14c651c69c64a8fac061e5b9547fb7d67da9ee5029 -size 1208321688 +oid sha256:06880550004cf06ee29a23adc8fe896368df9442dd7e44779a9b773423ffa396 +size 1208321704 diff --git a/model-00114-of-00130.safetensors b/model-00114-of-00130.safetensors index be0a3b12b05b32d5a51512e9079af002eb95064c..ce863c137373fbaa76544489ec2a661d695417bd 100644 --- a/model-00114-of-00130.safetensors +++ b/model-00114-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:08fb5a9fd03254204848af6413c7bf68876bee74f6bb37247d05dd2fc7480a84 -size 2463869968 +oid sha256:042131124955b2fd10e42f88d248cac356a1ae54f4f338ddf67b332fee82f1a4 +size 2463869984 diff --git a/model-00115-of-00130.safetensors b/model-00115-of-00130.safetensors index 4854afadea7dfa58e5b57295d5c31a8e1bd0bbcd..33de24e38c9638bd6d43ca76e7d411a733879372 100644 --- a/model-00115-of-00130.safetensors +++ b/model-00115-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:adf4ab941b453ba215787230e4a4f001623a5f06180deb3c5bed050160f463d7 -size 1208321688 +oid sha256:fd78f64fbc5d8fc9943ad03ac888d590351ac67367a4605f541567075c2a90a9 +size 1208321704 diff --git a/model-00116-of-00130.safetensors b/model-00116-of-00130.safetensors index d5e956684a2110d4a775c7363b7acd63fc1608f0..a842a7a9f272737db0892fd51e66a5870f2a8fa7 100644 --- a/model-00116-of-00130.safetensors +++ b/model-00116-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:45ba476d7a50d28db1380179ec3f2d3c274d35a362e2a6b680a6ab653aba88d1 -size 2463869968 +oid sha256:4857fbd1bb9738fbd98b8fe9700c055c8bf9c099874931a9e59a4f796f95a9c1 +size 2463869984 diff --git a/model-00117-of-00130.safetensors b/model-00117-of-00130.safetensors index cda5818d88196aca09ceb373fc78bd171fac297e..ecd61eb2f1990b5e8013168cd4191dcccbae3c78 100644 --- a/model-00117-of-00130.safetensors +++ b/model-00117-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:7f2a7438c4f6c66ac95eaaaba65c1935bfcb917884e021c30e588c74ac189fc5 -size 1208321688 +oid sha256:becc0b4f32f7d0d4de8d124ec62bb95b5f57936e63f6bfe8874d59bfc1d7edc1 +size 1208321704 diff --git a/model-00118-of-00130.safetensors b/model-00118-of-00130.safetensors index 3ec47c04235ef03580f4c9a02ec263085dcd3577..9f708b9c323544f7e661be4a2d2c48f26e79afd7 100644 --- a/model-00118-of-00130.safetensors +++ b/model-00118-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:a2eac3b06b70ff4f38c8166038b87e4e010e80fdb0c7fc32ff04b669b79bb390 -size 2463869968 +oid sha256:6b27be267068a02a5a33ec666d64ffb3548d05c1091f9e85b2aa227c643cf3a0 +size 2463869984 diff --git a/model-00119-of-00130.safetensors b/model-00119-of-00130.safetensors index 9eef34827a70af39dc3bc0b28f2eff32c1e22854..77774f46d3c3e4fed387010fcea7d8c8957b874e 100644 --- a/model-00119-of-00130.safetensors +++ b/model-00119-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:5cf0b16b764dd9303984a467fe3ad8a04b2b3908e230fc902425ec8746df804e -size 1208321688 +oid sha256:666681aaed291bc9298bb4a688b2c801dc3bb2fc796d51f07f5d5a72797b8658 +size 1208321704 diff --git a/model-00120-of-00130.safetensors b/model-00120-of-00130.safetensors index 4779b42c8d5104e8c41e536601c5e326063b3bad..d6756dfbe1a91e0d349ae09085564c19b5f636c8 100644 --- a/model-00120-of-00130.safetensors +++ b/model-00120-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:16a1f3697a6913aecb34e5d880c42a38d067a5172d52eb44f4fb1de914fa879b -size 2463869968 +oid sha256:9b0d00dbb435dea822d9afa5feca103b96f4ed36bca8eb4f1820b8702421e816 +size 2463869984 diff --git a/model-00121-of-00130.safetensors b/model-00121-of-00130.safetensors index 61b53e232155d41c21581e9c10998f54c2f4ac00..fdb87c910b52d9d095bdaf91fcab1b3280d43a00 100644 --- a/model-00121-of-00130.safetensors +++ b/model-00121-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:b2eabbce05904ab80f1919c0e74052810493c344eaa120dfc2b1bf46e195b230 -size 1208321688 +oid sha256:dbddd32ac1c6d80b443380a880ef4c435f10708eaae864cf745cbf76981cbf5b +size 1208321704 diff --git a/model-00122-of-00130.safetensors b/model-00122-of-00130.safetensors index 3a24937431600cd750f4e73ebc37e6279faf63eb..08bdc134314965dcec948e8f4aa57028ff7a080d 100644 --- a/model-00122-of-00130.safetensors +++ b/model-00122-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:2d4060a5e532922a3d5dae24262c08c21acd1a029e06650f806f9f3a111bcbfb -size 2463869968 +oid sha256:ac4d80cc6e5c9a20c7ab4a0010f14f1a313f195b031aefb84d83ac6c607cb102 +size 2463869984 diff --git a/model-00123-of-00130.safetensors b/model-00123-of-00130.safetensors index 2c3a3861da43e43da4924538a6ee77d1db0b38ed..ff32f4ccb6fd81b6c4194c3eb5fcb40e9b68c3d8 100644 --- a/model-00123-of-00130.safetensors +++ b/model-00123-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:2779fd92da6eb6c42edaf3b1e9cdcc5b7a501b5c9a25cfb3c210baf0f42d837a -size 1208321688 +oid sha256:3776600e4fea8e0d7b3c4c2667b0cdae07d4f5a7e7b30ce913b0c30c3a8ea0d8 +size 1208321704 diff --git a/model-00124-of-00130.safetensors b/model-00124-of-00130.safetensors index cf1adeeb58c0df852b1212f26d17cf76e616e11f..2c3f206da4a4c1a9138243d4108caabaaab187b5 100644 --- a/model-00124-of-00130.safetensors +++ b/model-00124-of-00130.safetensors @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:3439acf43dfe9db0ea78c681acccd0ee9b80d7c63b5865755921a1f1244a1a9c -size 1229199552 +oid sha256:3543ef495910c94d69b4707153646bb2d55588fef092d3450ac03e3179db11d9 +size 1229199568 diff --git a/modeling_list_ultra.py b/modeling_list_ultra.py new file mode 100644 index 0000000000000000000000000000000000000000..8846d38acc932d1dcb0302bb719296313f5225a8 --- /dev/null +++ b/modeling_list_ultra.py @@ -0,0 +1,706 @@ +# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 +# This file was automatically generated from src/transformers/models/minimax_m2/modular_minimax_m2.py. +# Do NOT edit this file manually as any edits will be overwritten by the generation of +# the file from the modular. If any change should be done, please apply the change to the +# modular_minimax_m2.py file directly. One of our CI enforces this. +# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 +# coding=utf-8 +# Copyright 2025 the HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from collections.abc import Callable +from typing import Optional, Union, Unpack + +import torch +from torch import nn + +from transformers.activations import ACT2FN +from transformers.cache_utils import Cache, DynamicCache +from transformers.generation import GenerationMixin +from transformers.integrations import use_kernel_forward_from_hub +from transformers.masking_utils import create_causal_mask, create_sliding_window_causal_mask +from transformers.modeling_flash_attention_utils import FlashAttentionKwargs +from transformers.modeling_layers import ( + GenericForQuestionAnswering, + GenericForSequenceClassification, + GenericForTokenClassification, + GradientCheckpointingLayer, +) +from transformers.modeling_outputs import MoeCausalLMOutputWithPast, MoeModelOutputWithPast +from transformers.modeling_rope_utils import ROPE_INIT_FUNCTIONS, dynamic_rope_update +from transformers.modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel +from transformers.utils import TransformersKwargs, auto_docstring, can_return_tuple +from transformers.utils.deprecation import deprecate_kwarg +from transformers.utils.generic import OutputRecorder, check_model_inputs +from .configuration_minimax_m2 import MiniMaxM2Config + + +class MiniMaxM2MLP(nn.Module): + def __init__(self, config: MiniMaxM2Config): + super().__init__() + self.ffn_dim = config.intermediate_size + self.hidden_dim = config.hidden_size + + self.w1 = nn.Linear(self.hidden_dim, self.ffn_dim, bias=False) + self.w2 = nn.Linear(self.ffn_dim, self.hidden_dim, bias=False) + self.w3 = nn.Linear(self.hidden_dim, self.ffn_dim, bias=False) + + self.act_fn = ACT2FN[config.hidden_act] + + def forward(self, hidden_states): + current_hidden_states = self.act_fn(self.w1(hidden_states)) * self.w3(hidden_states) + current_hidden_states = self.w2(current_hidden_states) + return current_hidden_states + + +class MiniMaxM2Experts(nn.ModuleList): + """ + ModuleList of experts. + """ + + def __init__(self, config: MiniMaxM2Config): + super().__init__() + self.top_k = config.num_experts_per_tok + self.num_experts = config.num_local_experts + for _ in range(self.num_experts): + self.append(MiniMaxM2MLP(config)) + + def forward( + self, hidden_states: torch.Tensor, top_k_index: torch.Tensor, top_k_weights: torch.Tensor + ) -> torch.Tensor: + """ + Args: + hidden_states: (batch_size * sequence_length, hidden_dim) + selected_experts: (batch_size * sequence_length, top_k) + routing_weights: (batch_size * sequence_length, top_k) + Returns: + (batch_size * sequence_length, hidden_dim) + """ + final_hidden_states = torch.zeros_like(hidden_states) + expert_mask = torch.nn.functional.one_hot(top_k_index, num_classes=self.num_experts).permute(2, 1, 0) + + expert_hit = torch.greater(expert_mask.sum(dim=(-1, -2)), 0).nonzero() + for expert_idx in expert_hit: + idx, top_x = torch.where(expert_mask[expert_idx].squeeze(0)) + current_state = hidden_states[None, top_x].reshape(-1, hidden_states.shape[-1]) + current_hidden_states = self[expert_idx](current_state) * top_k_weights[top_x, idx, None] + final_hidden_states.index_add_(0, top_x, current_hidden_states.to(hidden_states.dtype)) + return final_hidden_states + + +class MiniMaxM2SparseMoeBlock(nn.Module): + def __init__(self, config): + super().__init__() + self.top_k = config.num_experts_per_tok + self.jitter_noise = config.router_jitter_noise + self.gate = nn.Linear(config.hidden_size, config.num_local_experts, bias=False) + self.experts = MiniMaxM2Experts(config) + self.register_buffer("e_score_correction_bias", torch.zeros(config.num_local_experts)) + + def route_tokens_to_experts(self, router_logits): + routing_weights = torch.nn.functional.sigmoid(router_logits.float()) + scores_for_choice = routing_weights + self.e_score_correction_bias + _, top_k_index = torch.topk(scores_for_choice, self.top_k, dim=-1, sorted=False) + top_k_weights = routing_weights.gather(1, top_k_index) + top_k_weights /= top_k_weights.sum(dim=-1, keepdim=True) + return top_k_index, top_k_weights.to(router_logits.dtype) + + def forward(self, hidden_states: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]: + batch_size, sequence_length, hidden_dim = hidden_states.shape + if self.training and self.jitter_noise > 0: + hidden_states *= torch.empty_like(hidden_states).uniform_(1.0 - self.jitter_noise, 1.0 + self.jitter_noise) + hidden_states = hidden_states.view(-1, hidden_states.shape[-1]) + router_logits = self.gate(hidden_states) + top_k_index, top_k_weights = self.route_tokens_to_experts(router_logits) + hidden_states = self.experts(hidden_states, top_k_index, top_k_weights.to(hidden_states.dtype)) + hidden_states = hidden_states.reshape(batch_size, sequence_length, hidden_dim) + return hidden_states, router_logits + + +@use_kernel_forward_from_hub("RMSNorm") +class MiniMaxM2RMSNorm(nn.Module): + def __init__(self, hidden_size, eps=1e-6): + """ + MiniMaxM2RMSNorm is equivalent to T5LayerNorm + """ + super().__init__() + self.weight = nn.Parameter(torch.ones(hidden_size)) + self.variance_epsilon = eps + + def forward(self, hidden_states): + input_dtype = hidden_states.dtype + hidden_states = hidden_states.to(torch.float32) + variance = hidden_states.pow(2).mean(-1, keepdim=True) + hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon) + return self.weight * hidden_states.to(input_dtype) + + def extra_repr(self): + return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}" + + +def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor: + """ + This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch, + num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim) + """ + batch, num_key_value_heads, slen, head_dim = hidden_states.shape + if n_rep == 1: + return hidden_states + hidden_states = hidden_states[:, :, None, :, :].expand(batch, num_key_value_heads, n_rep, slen, head_dim) + return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim) + + +def eager_attention_forward( + module: nn.Module, + query: torch.Tensor, + key: torch.Tensor, + value: torch.Tensor, + attention_mask: Optional[torch.Tensor], + scaling: float, + dropout: float = 0.0, + **kwargs: Unpack[TransformersKwargs], +): + key_states = repeat_kv(key, module.num_key_value_groups) + value_states = repeat_kv(value, module.num_key_value_groups) + + attn_weights = torch.matmul(query, key_states.transpose(2, 3)) * scaling + if attention_mask is not None: + causal_mask = attention_mask[:, :, :, : key_states.shape[-2]] + attn_weights = attn_weights + causal_mask + + attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query.dtype) + attn_weights = nn.functional.dropout(attn_weights, p=dropout, training=module.training) + attn_output = torch.matmul(attn_weights, value_states) + attn_output = attn_output.transpose(1, 2).contiguous() + + return attn_output, attn_weights + + +def rotate_half(x): + """Rotates half the hidden dims of the input.""" + x1 = x[..., : x.shape[-1] // 2] + x2 = x[..., x.shape[-1] // 2 :] + return torch.cat((-x2, x1), dim=-1) + + +def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1): + """Applies Rotary Position Embedding to the query and key tensors. + + Args: + q (`torch.Tensor`): The query tensor. + k (`torch.Tensor`): The key tensor. + cos (`torch.Tensor`): The cosine part of the rotary embedding. + sin (`torch.Tensor`): The sine part of the rotary embedding. + position_ids (`torch.Tensor`, *optional*): + Deprecated and unused. + unsqueeze_dim (`int`, *optional*, defaults to 1): + The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and + sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note + that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and + k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes + cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have + the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2. + Returns: + `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding. + """ + cos = cos.unsqueeze(unsqueeze_dim) + sin = sin.unsqueeze(unsqueeze_dim) + + # Keep half or full tensor for later concatenation + rotary_dim = cos.shape[-1] + q_rot, q_pass = q[..., :rotary_dim], q[..., rotary_dim:] + k_rot, k_pass = k[..., :rotary_dim], k[..., rotary_dim:] + + # Apply rotary embeddings on the first half or full tensor + q_embed = (q_rot * cos) + (rotate_half(q_rot) * sin) + k_embed = (k_rot * cos) + (rotate_half(k_rot) * sin) + + # Concatenate back to full shape + q_embed = torch.cat([q_embed, q_pass], dim=-1) + k_embed = torch.cat([k_embed, k_pass], dim=-1) + return q_embed, k_embed + + +class MiniMaxM2Attention(nn.Module): + """Multi-headed attention from 'Attention Is All You Need' paper""" + + def __init__(self, config: MiniMaxM2Config, layer_idx: int): + super().__init__() + self.config = config + self.layer_idx = layer_idx + self.head_dim = getattr(config, "head_dim", None) or config.hidden_size // config.num_attention_heads + self.num_key_value_groups = config.num_attention_heads // config.num_key_value_heads + self.scaling = self.head_dim**-0.5 + self.attention_dropout = config.attention_dropout + self.is_causal = True + self.q_proj = nn.Linear(config.hidden_size, config.num_attention_heads * self.head_dim, bias=False) + self.k_proj = nn.Linear(config.hidden_size, config.num_key_value_heads * self.head_dim, bias=False) + self.v_proj = nn.Linear(config.hidden_size, config.num_key_value_heads * self.head_dim, bias=False) + self.o_proj = nn.Linear(config.num_attention_heads * self.head_dim, config.hidden_size, bias=False) + + self.use_qk_norm = config.use_qk_norm + if self.use_qk_norm: + self.q_norm = MiniMaxM2RMSNorm(self.head_dim * config.num_attention_heads, eps=config.rms_norm_eps) + self.k_norm = MiniMaxM2RMSNorm(self.head_dim * config.num_key_value_heads, eps=config.rms_norm_eps) + + @deprecate_kwarg("past_key_value", new_name="past_key_values", version="4.58") + def forward( + self, + hidden_states: torch.Tensor, + position_embeddings: tuple[torch.Tensor, torch.Tensor], + attention_mask: Optional[torch.Tensor], + past_key_values: Optional[Cache] = None, + cache_position: Optional[torch.LongTensor] = None, + **kwargs: Unpack[FlashAttentionKwargs], + ) -> tuple[torch.Tensor, Optional[torch.Tensor]]: + input_shape = hidden_states.shape[:-1] + hidden_shape = (*input_shape, -1, self.head_dim) + + query_states = self.q_proj(hidden_states) + key_states = self.k_proj(hidden_states) + value_states = self.v_proj(hidden_states) + + if self.use_qk_norm: # main diff from Llama + query_states = self.q_norm(query_states) + key_states = self.k_norm(key_states) + + key_states = key_states.view(hidden_shape) + query_states = query_states.view(hidden_shape) + value_states = value_states.view(hidden_shape) + + query_states = query_states.transpose(1, 2) + key_states = key_states.transpose(1, 2) + value_states = value_states.transpose(1, 2) + + cos, sin = position_embeddings + query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin) + + if past_key_values is not None: + # sin and cos are specific to RoPE models; position_ids needed for the static cache + cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position} + key_states, value_states = past_key_values.update(key_states, value_states, self.layer_idx, cache_kwargs) + + attention_interface: Callable = eager_attention_forward + if self.config._attn_implementation != "eager": + attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation] + + attn_output, attn_weights = attention_interface( + self, + query_states, + key_states, + value_states, + attention_mask, + dropout=0.0 if not self.training else self.attention_dropout, + scaling=self.scaling, + **kwargs, + ) + + attn_output = attn_output.reshape(*input_shape, -1).contiguous() + attn_output = self.o_proj(attn_output) + return attn_output, attn_weights + + +class MiniMaxM2DecoderLayer(GradientCheckpointingLayer): + def __init__(self, config: MiniMaxM2Config, layer_idx: int): + super().__init__() + self.hidden_size = config.hidden_size + + self.self_attn = MiniMaxM2Attention(config, layer_idx) + + self.block_sparse_moe = MiniMaxM2SparseMoeBlock(config) + self.input_layernorm = MiniMaxM2RMSNorm(config.hidden_size, eps=config.rms_norm_eps) + self.post_attention_layernorm = MiniMaxM2RMSNorm(config.hidden_size, eps=config.rms_norm_eps) + + @deprecate_kwarg("past_key_value", new_name="past_key_values", version="4.58") + def forward( + self, + hidden_states: torch.Tensor, + position_embeddings: tuple[torch.Tensor, torch.Tensor], + attention_mask: Optional[torch.Tensor] = None, + position_ids: Optional[torch.LongTensor] = None, + past_key_values: Optional[Cache] = None, + cache_position: Optional[torch.LongTensor] = None, + **kwargs: Unpack[TransformersKwargs], + ) -> torch.FloatTensor: + residual = hidden_states + + hidden_states = self.input_layernorm(hidden_states) + + # Self Attention + hidden_states, _ = self.self_attn( + hidden_states=hidden_states, + position_embeddings=position_embeddings, + attention_mask=attention_mask, + position_ids=position_ids, + past_key_values=past_key_values, + cache_position=cache_position, + **kwargs, + ) + hidden_states = residual + hidden_states + + # Fully Connected + residual = hidden_states + hidden_states = self.post_attention_layernorm(hidden_states) + hidden_states, _ = self.block_sparse_moe(hidden_states) + hidden_states = residual + hidden_states + + return hidden_states + + +class MiniMaxM2RotaryEmbedding(nn.Module): + inv_freq: torch.Tensor # fix linting for `register_buffer` + + def __init__(self, config: MiniMaxM2Config, device=None): + super().__init__() + # BC: "rope_type" was originally "type" + if hasattr(config, "rope_scaling") and isinstance(config.rope_scaling, dict): + self.rope_type = config.rope_scaling.get("rope_type", config.rope_scaling.get("type")) + else: + self.rope_type = "default" + self.max_seq_len_cached = config.max_position_embeddings + self.original_max_seq_len = config.max_position_embeddings + + self.config = config + self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type] + + inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device) + self.register_buffer("inv_freq", inv_freq, persistent=False) + self.original_inv_freq = self.inv_freq + + @torch.no_grad() + @dynamic_rope_update # power user: used with advanced RoPE types (e.g. dynamic rope) + def forward(self, x, position_ids): + inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1).to(x.device) + position_ids_expanded = position_ids[:, None, :].float() + + device_type = x.device.type if isinstance(x.device.type, str) and x.device.type != "mps" else "cpu" + with torch.autocast(device_type=device_type, enabled=False): # Force float32 + freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2) + emb = torch.cat((freqs, freqs), dim=-1) + cos = emb.cos() * self.attention_scaling + sin = emb.sin() * self.attention_scaling + + return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype) + + +@auto_docstring +class MiniMaxM2PreTrainedModel(PreTrainedModel): + config: MiniMaxM2Config + base_model_prefix = "model" + supports_gradient_checkpointing = True + _no_split_modules = ["MiniMaxM2DecoderLayer"] + _skip_keys_device_placement = ["past_key_values"] + _supports_flash_attn = True + _supports_sdpa = True + _supports_flex_attn = True + _can_compile_fullgraph = False # MoE models don't work with torch.compile (`torch.where(condition)` not supported) + _supports_attention_backend = True + _can_record_outputs = { + "router_logits": OutputRecorder(MiniMaxM2SparseMoeBlock, index=1), + "hidden_states": MiniMaxM2DecoderLayer, + "attentions": MiniMaxM2Attention, + } + + +@auto_docstring +class MiniMaxM2Model(MiniMaxM2PreTrainedModel): + def __init__(self, config: MiniMaxM2Config): + super().__init__(config) + self.padding_idx = config.pad_token_id + self.vocab_size = config.vocab_size + + self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx) + self.layers = nn.ModuleList( + [MiniMaxM2DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)] + ) + self.norm = MiniMaxM2RMSNorm(config.hidden_size, eps=config.rms_norm_eps) + self.rotary_emb = MiniMaxM2RotaryEmbedding(config=config) + self.gradient_checkpointing = False + + # Initialize weights and apply final processing + self.post_init() + + @check_model_inputs + @auto_docstring + def forward( + self, + input_ids: Optional[torch.LongTensor] = None, + attention_mask: Optional[torch.Tensor] = None, + position_ids: Optional[torch.LongTensor] = None, + past_key_values: Optional[Cache] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + use_cache: Optional[bool] = None, + cache_position: Optional[torch.LongTensor] = None, + **kwargs: Unpack[TransformersKwargs], + ) -> MoeModelOutputWithPast: + if (input_ids is None) ^ (inputs_embeds is not None): + raise ValueError("You must specify exactly one of input_ids or inputs_embeds") + + if use_cache and past_key_values is None: + past_key_values = DynamicCache(config=self.config) + + if inputs_embeds is None: + inputs_embeds = self.embed_tokens(input_ids) + + if cache_position is None: + past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0 + cache_position = torch.arange( + past_seen_tokens, past_seen_tokens + inputs_embeds.shape[1], device=inputs_embeds.device + ) + if position_ids is None: + position_ids = cache_position.unsqueeze(0) + + mask_function = create_causal_mask if self.config.sliding_window is None else create_sliding_window_causal_mask + causal_mask = mask_function( + config=self.config, + input_embeds=inputs_embeds, + attention_mask=attention_mask, + cache_position=cache_position, + past_key_values=past_key_values, + position_ids=position_ids, + ) + + hidden_states = inputs_embeds + + # create position embeddings to be shared across the decoder layers + position_embeddings = self.rotary_emb(hidden_states, position_ids) + + for decoder_layer in self.layers[: self.config.num_hidden_layers]: + hidden_states = decoder_layer( + hidden_states, + position_embeddings=position_embeddings, + attention_mask=causal_mask, + position_ids=position_ids, + past_key_values=past_key_values, + use_cache=use_cache, + cache_position=cache_position, + **kwargs, + ) + + hidden_states = self.norm(hidden_states) + + return MoeModelOutputWithPast( # only diff with Mistral is the output type, we need MoE + last_hidden_state=hidden_states, + past_key_values=past_key_values, + ) + + +def load_balancing_loss_func( + gate_logits: Union[torch.Tensor, tuple[torch.Tensor], None], + num_experts: Optional[int] = None, + top_k=2, + attention_mask: Optional[torch.Tensor] = None, +) -> Union[torch.Tensor, int]: + r""" + Computes auxiliary load balancing loss as in Switch Transformer - implemented in Pytorch. + + See Switch Transformer (https://huggingface.co/papers/2101.03961) for more details. This function implements the loss + function presented in equations (4) - (6) of the paper. It aims at penalizing cases where the routing between + experts is too unbalanced. + + Args: + gate_logits: + Logits from the `gate`, should be a tuple of model.config.num_hidden_layers tensors of + shape [batch_size X sequence_length, num_experts]. + num_experts: + Number of experts + top_k: + The number of experts to route per-token, can be also interpreted as the `top-k` routing + parameter. + attention_mask (`torch.Tensor`, *optional*): + The attention_mask used in forward function + shape [batch_size X sequence_length] if not None. + + Returns: + The auxiliary loss. + """ + if gate_logits is None or not isinstance(gate_logits, tuple): + return 0 + + if isinstance(gate_logits, tuple): + compute_device = gate_logits[0].device + concatenated_gate_logits = torch.cat([layer_gate.to(compute_device) for layer_gate in gate_logits], dim=0) + + routing_weights = torch.nn.functional.softmax(concatenated_gate_logits, dim=-1) + + _, selected_experts = torch.topk(routing_weights, top_k, dim=-1) + + expert_mask = torch.nn.functional.one_hot(selected_experts, num_experts) + + if attention_mask is None: + # Compute the percentage of tokens routed to each experts + tokens_per_expert = torch.mean(expert_mask.float(), dim=0) + + # Compute the average probability of routing to these experts + router_prob_per_expert = torch.mean(routing_weights, dim=0) + else: + batch_size, sequence_length = attention_mask.shape + num_hidden_layers = concatenated_gate_logits.shape[0] // (batch_size * sequence_length) + + # Compute the mask that masks all padding tokens as 0 with the same shape of expert_mask + expert_attention_mask = ( + attention_mask[None, :, :, None, None] + .expand((num_hidden_layers, batch_size, sequence_length, top_k, num_experts)) + .reshape(-1, top_k, num_experts) + .to(compute_device) + ) + + # Compute the percentage of tokens routed to each experts + tokens_per_expert = torch.sum(expert_mask.float() * expert_attention_mask, dim=0) / torch.sum( + expert_attention_mask, dim=0 + ) + + # Compute the mask that masks all padding tokens as 0 with the same shape of tokens_per_expert + router_per_expert_attention_mask = ( + attention_mask[None, :, :, None] + .expand((num_hidden_layers, batch_size, sequence_length, num_experts)) + .reshape(-1, num_experts) + .to(compute_device) + ) + + # Compute the average probability of routing to these experts + router_prob_per_expert = torch.sum(routing_weights * router_per_expert_attention_mask, dim=0) / torch.sum( + router_per_expert_attention_mask, dim=0 + ) + + overall_loss = torch.sum(tokens_per_expert * router_prob_per_expert.unsqueeze(0)) + return overall_loss * num_experts + + +@auto_docstring +class MiniMaxM2ForCausalLM(MiniMaxM2PreTrainedModel, GenerationMixin): + _tied_weights_keys = ["lm_head.weight"] + _tp_plan = {"lm_head": "colwise_rep"} + _pp_plan = {"lm_head": (["hidden_states"], ["logits"])} + + def __init__(self, config): + super().__init__(config) + self.model = MiniMaxM2Model(config) + self.vocab_size = config.vocab_size + self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False) + self.router_aux_loss_coef = config.router_aux_loss_coef + self.num_experts = config.num_local_experts + self.num_experts_per_tok = config.num_experts_per_tok + + # Initialize weights and apply final processing + self.post_init() + + @can_return_tuple + @auto_docstring + def forward( + self, + input_ids: Optional[torch.LongTensor] = None, + attention_mask: Optional[torch.Tensor] = None, + position_ids: Optional[torch.LongTensor] = None, + past_key_values: Optional[Cache] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + labels: Optional[torch.LongTensor] = None, + use_cache: Optional[bool] = None, + output_router_logits: Optional[bool] = None, + cache_position: Optional[torch.LongTensor] = None, + logits_to_keep: Union[int, torch.Tensor] = 0, + **kwargs: Unpack[TransformersKwargs], + ) -> MoeCausalLMOutputWithPast: + r""" + labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*): + Labels for computing the masked language modeling loss. Indices should either be in `[0, ..., + config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored + (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`. + + Example: + + ```python + >>> from transformers import AutoTokenizer, MiniMaxM2ForCausalLM + + >>> model = MiniMaxM2ForCausalLM.from_pretrained("mistralai/MiniMaxM2-8x7B-v0.1") + >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/MiniMaxM2-8x7B-v0.1") + + >>> prompt = "Hey, are you conscious? Can you talk to me?" + >>> inputs = tokenizer(prompt, return_tensors="pt") + + >>> # Generate + >>> generate_ids = model.generate(inputs.input_ids, max_length=30) + >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] + "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you." + ```""" + + output_router_logits = ( + output_router_logits if output_router_logits is not None else self.config.output_router_logits + ) + + # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn) + outputs: MoeModelOutputWithPast = self.model( + input_ids=input_ids, + attention_mask=attention_mask, + position_ids=position_ids, + past_key_values=past_key_values, + inputs_embeds=inputs_embeds, + use_cache=use_cache, + output_router_logits=output_router_logits, + cache_position=cache_position, + **kwargs, + ) + + hidden_states = outputs.last_hidden_state + # Only compute necessary logits, and do not upcast them to float if we are not computing the loss + slice_indices = slice(-logits_to_keep, None) if isinstance(logits_to_keep, int) else logits_to_keep + logits = self.lm_head(hidden_states[:, slice_indices, :]) + + loss = None + if labels is not None: + loss = self.loss_function(logits, labels, self.vocab_size, **kwargs) + + aux_loss = None + if output_router_logits: + aux_loss = load_balancing_loss_func( + outputs.router_logits, + self.num_experts, + self.num_experts_per_tok, + attention_mask, + ) + if labels is not None: + loss += self.router_aux_loss_coef * aux_loss.to(loss.device) # make sure to reside in the same device + + return MoeCausalLMOutputWithPast( + loss=loss, + aux_loss=aux_loss, + logits=logits, + past_key_values=outputs.past_key_values, + hidden_states=outputs.hidden_states, + attentions=outputs.attentions, + router_logits=outputs.router_logits, + ) + + +class MiniMaxM2ForSequenceClassification(GenericForSequenceClassification, MiniMaxM2PreTrainedModel): + pass + + +class MiniMaxM2ForTokenClassification(GenericForTokenClassification, MiniMaxM2PreTrainedModel): + pass + + +class MiniMaxM2ForQuestionAnswering(GenericForQuestionAnswering, MiniMaxM2PreTrainedModel): + pass + + +__all__ = [ + "MiniMaxM2ForCausalLM", + "MiniMaxM2ForQuestionAnswering", + "MiniMaxM2Model", + "MiniMaxM2PreTrainedModel", + "MiniMaxM2ForSequenceClassification", + "MiniMaxM2ForTokenClassification", +] diff --git a/subir_huggingface.py b/subir_huggingface.py new file mode 100644 index 0000000000000000000000000000000000000000..d2acdf9f55e8ad8dee4908da7579e19241c312da --- /dev/null +++ b/subir_huggingface.py @@ -0,0 +1,19 @@ +from huggingface_hub import HfApi + +api = HfApi() + +# O nome do seu repositΓ³rio no HF +repo_id = "List-cloud/List-3.0-Ultra-Coder-Brain" + +print("Iniciando upload para o Hugging Face... Isso pode demorar bastante dependendo da internet.") + +# Faz o upload da pasta inteira, substituindo os arquivos antigos no HF +api.upload_folder( + folder_path=r"K:\List-3.0-Ultra-Coder\List-3.0-Ultra-Coder-Brain", + repo_id=repo_id, + repo_type="model", + # Ignora os scripts de automaΓ§Γ£o que vocΓͺ nΓ£o quer subir + ignore_patterns=["*.pyc", "update_model_hashes.py", "boost_downloads.py", "upload_model.py"] +) + +print("Upload concluΓ­do com sucesso!") diff --git a/tokenizer_config.json b/tokenizer_config.json index ff8e2ebcbdb03324603c0a734e459ec9968096ae..4801e04325e0078db80978918a3f3a0ad8fc09f6 100644 --- a/tokenizer_config.json +++ b/tokenizer_config.json @@ -1,495 +1,496 @@ -{ - "added_tokens_decoder": { - "200000": { - "content": "]!p~[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200001": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200002": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200003": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200004": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200005": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200006": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200007": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200008": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200009": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200010": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200011": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200012": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200013": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200014": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200015": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200016": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200017": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200018": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200019": { - "content": "]~b]", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200020": { - "content": "[e~[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200021": { - "content": "]!d~[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200022": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200023": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200024": { - "content": "]<]speech[>[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200025": { - "content": "]<]image[>[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200026": { - "content": "]<]video[>[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200027": { - "content": "]<]start of speech[>[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200028": { - "content": "]<]end of speech[>[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200029": { - "content": "]<]start of image[>[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200030": { - "content": "]<]end of image[>[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200031": { - "content": "]<]start of video[>[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200032": { - "content": "]<]end of video[>[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200033": { - "content": "]<]vision pad[>[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200034": { - "content": "]~!b[", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200035": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200036": { - "content": "", - "single_word": false, - "lstrip": false, - "rstrip": false, - "normalized": false, - "special": true - }, - "200037": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200038": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200039": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200040": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200041": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200042": { - "content": "", - "lstrip": false, - "normalized": false, - "rstrip": false, - "single_word": false, - "special": true - }, - "200043": { - "content": "", - "single_word": false, - "lstrip": false, - "rstrip": false, - "normalized": false, - "special": true - }, - "200044": { - "content": "", - "single_word": false, - "lstrip": false, - "rstrip": false, - "normalized": false, - "special": true - }, - "200045": { - "content": "", - "single_word": false, - "lstrip": false, - "rstrip": false, - "normalized": false, - "special": true - }, - "200046": { - "content": "", - "single_word": false, - "lstrip": false, - "rstrip": false, - "normalized": false, - "special": true - }, - "200047": { - "content": "", - "single_word": false, - "lstrip": false, - "rstrip": false, - "normalized": false, - "special": true - }, - "200048": { - "content": "", - "single_word": false, - "lstrip": false, - "rstrip": false, - "normalized": false, - "special": true - }, - "200049": { - "content": "", - "single_word": false, - "lstrip": false, - "rstrip": false, - "normalized": false, - "special": true - }, - "200050": { - "content": "", - "single_word": false, - "lstrip": false, - "rstrip": false, - "normalized": false, - "special": false - }, - "200051": { - "content": "", - "single_word": false, - "lstrip": false, - "rstrip": false, - "normalized": false, - "special": false - }, - "200052": { - "content": "", - "single_word": false, - "lstrip": false, - "rstrip": false, - "normalized": false, - "special": false - }, - "200053": { - "content": "", - "single_word": false, - "lstrip": false, - "rstrip": false, - "normalized": false, - "special": false - } - }, - "additional_special_tokens": [ - "", - "", - "", - "", - "", - "", - "", - "", - "", - "", - "", - "", - "]<]speech[>[", - "]<]image[>[", - "]<]video[>[", - "]<]start of speech[>[", - "]<]end of speech[>[", - "]<]start of image[>[", - "]<]end of image[>[", - "]<]start of video[>[", - "]<]end of video[>[", - "]<]vision pad[>[", - "]~!b[", - "", - "", - "", - "", - "", - "", - "", - "", - "[e~[", - "]!d~[", - "]!p~[", - "]~b]", - "", - "", - "", - "", - "", - "", - "", - "", - "", - "", - "", - "", - "", - "", - "" - ], - "add_prefix_space": false, - "bos_token": "]~!b[", - "clean_up_tokenization_spaces": false, - "eos_token": "[e~[", - "model_max_length": 40960000, - "tokenizer_class": "GPT2Tokenizer", - "unk_token": "]!d~[" -} +{ + "added_tokens_decoder": { + "200000": { + "content": "]!p~[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200001": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200002": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200003": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200004": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200005": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200006": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200007": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200008": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200009": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200010": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200011": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200012": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200013": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200014": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200015": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200016": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200017": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200018": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200019": { + "content": "]~b]", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200020": { + "content": "[e~[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200021": { + "content": "]!d~[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200022": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200023": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200024": { + "content": "]<]speech[>[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200025": { + "content": "]<]image[>[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200026": { + "content": "]<]video[>[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200027": { + "content": "]<]start of speech[>[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200028": { + "content": "]<]end of speech[>[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200029": { + "content": "]<]start of image[>[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200030": { + "content": "]<]end of image[>[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200031": { + "content": "]<]start of video[>[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200032": { + "content": "]<]end of video[>[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200033": { + "content": "]<]vision pad[>[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200034": { + "content": "]~!b[", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200035": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200036": { + "content": "", + "single_word": false, + "lstrip": false, + "rstrip": false, + "normalized": false, + "special": true + }, + "200037": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200038": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200039": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200040": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200041": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200042": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "200043": { + "content": "", + "single_word": false, + "lstrip": false, + "rstrip": false, + "normalized": false, + "special": true + }, + "200044": { + "content": "", + "single_word": false, + "lstrip": false, + "rstrip": false, + "normalized": false, + "special": true + }, + "200045": { + "content": "", + "single_word": false, + "lstrip": false, + "rstrip": false, + "normalized": false, + "special": true + }, + "200046": { + "content": "", + "single_word": false, + "lstrip": false, + "rstrip": false, + "normalized": false, + "special": true + }, + "200047": { + "content": "", + "single_word": false, + "lstrip": false, + "rstrip": false, + "normalized": false, + "special": true + }, + "200048": { + "content": "", + "single_word": false, + "lstrip": false, + "rstrip": false, + "normalized": false, + "special": true + }, + "200049": { + "content": "", + "single_word": false, + "lstrip": false, + "rstrip": false, + "normalized": false, + "special": true + }, + "200050": { + "content": "", + "single_word": false, + "lstrip": false, + "rstrip": false, + "normalized": false, + "special": false + }, + "200051": { + "content": "", + "single_word": false, + "lstrip": false, + "rstrip": false, + "normalized": false, + "special": false + }, + "200052": { + "content": "", + "single_word": false, + "lstrip": false, + "rstrip": false, + "normalized": false, + "special": false + }, + "200053": { + "content": "", + "single_word": false, + "lstrip": false, + "rstrip": false, + "normalized": false, + "special": false + } + }, + "additional_special_tokens": [ + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "]<]speech[>[", + "]<]image[>[", + "]<]video[>[", + "]<]start of speech[>[", + "]<]end of speech[>[", + "]<]start of image[>[", + "]<]end of image[>[", + "]<]start of video[>[", + "]<]end of video[>[", + "]<]vision pad[>[", + "]~!b[", + "", + "", + "", + "", + "", + "", + "", + "", + "[e~[", + "]!d~[", + "]!p~[", + "]~b]", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "", + "" + ], + "add_prefix_space": false, + "bos_token": "]~!b[", + "clean_up_tokenization_spaces": false, + "eos_token": "[e~[", + "model_max_length": 40960000, + "tokenizer_class": "GPT2Tokenizer", + "unk_token": "]!d~[", + "model_creator": "List Cloud" +} \ No newline at end of file