--- license: other license_name: mrl license_link: https://mistral.ai/licenses/MRL-0.1.md language: - en - fr - de - es - it - pt - zh - ja - ru - ko --- # Mistral-Large-218B-Instruct ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/P-BGJ5Ba2d1NkpdGXNThe.png) Mistral-Large-218B-Instruct is an advanced dense Large Language Model (LLM) with 218 billion parameters, featuring state-of-the-art reasoning, knowledge, and coding capabilities. Self-merged from the original Mistral Large 2, see mergekit config below. ## Key features - Massive scale: With 218 billion parameters, this model pushes the boundaries of language model capabilities. - Multi-lingual by design: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish. - Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, JavaScript, and Bash, as well as more specific languages like Swift and Fortran. - Agentic-centric: Best-in-class agentic capabilities with native function calling and JSON outputting. - Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities. - Mistral Research License: Allows usage and modification for research and non-commercial purposes. - Large Context: Features a large 128k context window for handling extensive input. ## Metrics Note: The following metrics are based on the original model and may differ for this 218B parameter version. Updated benchmarks will be provided when available. **Base Pretrained Benchmarks** | Benchmark | Score | | --- | --- | | MMLU | 84.0% | **Base Pretrained Multilingual Benchmarks (MMLU)** | Benchmark | Score | | --- | --- | | French | 82.8% | | German | 81.6% | | Spanish | 82.7% | | Italian | 82.7% | | Dutch | 80.7% | | Portuguese | 81.6% | | Russian | 79.0% | | Korean | 60.1% | | Japanese | 78.8% | | Chinese | 74.8% | **Instruction Benchmarks** | Benchmark | Score | | --- | --- | | MT Bench | 8.63 | | Wild Bench | 56.3 | | Arena Hard| 73.2 | **Code & Reasoning Benchmarks** | Benchmark | Score | | --- | --- | | Human Eval | 92% | | Human Eval Plus| 87% | | MBPP Base| 80% | | MBPP Plus| 69% | **Math Benchmarks** | Benchmark | Score | | --- | --- | | GSM8K | 93% | | Math Instruct (0-shot, no CoT) | 70% | | Math Instruct (0-shot, CoT)| 71.5% | ## Usage This model can be used with standard LLM frameworks and libraries. Specific usage instructions will be provided upon release. ## Hardware Requirements Given the size of this model (218B parameters), it requires substantial computational resources for inference: - Recommended: 8xH100 (640GB) - Alternatively: Distributed inference setup across multiple machines. ## Limitations - This model does not have built-in moderation mechanisms. Users should implement appropriate safeguards for deployment in production environments. - Due to its size, inference may be computationally expensive and require significant hardware resources. - As with all large language models, it may exhibit biases present in its training data. - The model's outputs should be critically evaluated, especially for sensitive applications. ## Notes This was just a fun testing model, merged with the `merge.py` script in the base of the repo. Find GGUFs at [leafspark/Mistral-Large-218B-Instruct-GGUF](https://huggingface.co/leafspark/Mistral-Large-218B-Instruct-GGUF/) Compatible `mergekit` config: ```yaml slices: - sources: - layer_range: [0, 20] model: mistralai/Mistral-Large-Instruct-2407 - sources: - layer_range: [10, 30] model: mistralai/Mistral-Large-Instruct-2407 - sources: - layer_range: [20, 40] model: mistralai/Mistral-Large-Instruct-2407 - sources: - layer_range: [30, 50] model: mistralai/Mistral-Large-Instruct-2407 - sources: - layer_range: [40, 60] model: mistralai/Mistral-Large-Instruct-2407 - sources: - layer_range: [50, 70] model: mistralai/Mistral-Large-Instruct-2407 - sources: - layer_range: [60, 80] model: mistralai/Mistral-Large-Instruct-2407 - sources: - layer_range: [70, 87] model: mistralai/Mistral-Large-Instruct-2407 merge_method: passthrough dtype: bfloat16 ```