|
--- |
|
license: other |
|
license_name: mrl |
|
license_link: https://mistral.ai/licenses/MRL-0.1.md |
|
language: |
|
- en |
|
- fr |
|
- de |
|
- es |
|
- it |
|
- pt |
|
- zh |
|
- ja |
|
- ru |
|
- ko |
|
--- |
|
|
|
# Mistral-Large-218B-Instruct |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/P-BGJ5Ba2d1NkpdGXNThe.png) |
|
|
|
Mistral-Large-218B-Instruct is an advanced dense Large Language Model (LLM) with 218 billion parameters, featuring state-of-the-art reasoning, knowledge, and coding capabilities. |
|
|
|
Self-merged from the original Mistral Large 2, see mergekit config below. |
|
|
|
## Key features |
|
- Massive scale: With 218 billion parameters, this model pushes the boundaries of language model capabilities. |
|
- Multi-lingual by design: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish. |
|
- Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, JavaScript, and Bash, as well as more specific languages like Swift and Fortran. |
|
- Agentic-centric: Best-in-class agentic capabilities with native function calling and JSON outputting. |
|
- Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities. |
|
- Mistral Research License: Allows usage and modification for research and non-commercial purposes. |
|
- Large Context: Features a large 128k context window for handling extensive input. |
|
|
|
## Metrics |
|
|
|
Note: The following metrics are based on the original model and may differ for this 218B parameter version. Updated benchmarks will be provided when available. |
|
|
|
**Base Pretrained Benchmarks** |
|
|
|
| Benchmark | Score | |
|
| --- | --- | |
|
| MMLU | 84.0% | |
|
|
|
**Base Pretrained Multilingual Benchmarks (MMLU)** |
|
| Benchmark | Score | |
|
| --- | --- | |
|
| French | 82.8% | |
|
| German | 81.6% | |
|
| Spanish | 82.7% | |
|
| Italian | 82.7% | |
|
| Dutch | 80.7% | |
|
| Portuguese | 81.6% | |
|
| Russian | 79.0% | |
|
| Korean | 60.1% | |
|
| Japanese | 78.8% | |
|
| Chinese | 74.8% | |
|
|
|
**Instruction Benchmarks** |
|
|
|
| Benchmark | Score | |
|
| --- | --- | |
|
| MT Bench | 8.63 | |
|
| Wild Bench | 56.3 | |
|
| Arena Hard| 73.2 | |
|
|
|
**Code & Reasoning Benchmarks** |
|
| Benchmark | Score | |
|
| --- | --- | |
|
| Human Eval | 92% | |
|
| Human Eval Plus| 87% | |
|
| MBPP Base| 80% | |
|
| MBPP Plus| 69% | |
|
|
|
**Math Benchmarks** |
|
|
|
| Benchmark | Score | |
|
| --- | --- | |
|
| GSM8K | 93% | |
|
| Math Instruct (0-shot, no CoT) | 70% | |
|
| Math Instruct (0-shot, CoT)| 71.5% | |
|
|
|
## Usage |
|
|
|
This model can be used with standard LLM frameworks and libraries. Specific usage instructions will be provided upon release. |
|
|
|
## Hardware Requirements |
|
|
|
Given the size of this model (218B parameters), it requires substantial computational resources for inference: |
|
- Recommended: 8xH100 (640GB) |
|
- Alternatively: Distributed inference setup across multiple machines. |
|
|
|
## Limitations |
|
|
|
- This model does not have built-in moderation mechanisms. Users should implement appropriate safeguards for deployment in production environments. |
|
- Due to its size, inference may be computationally expensive and require significant hardware resources. |
|
- As with all large language models, it may exhibit biases present in its training data. |
|
- The model's outputs should be critically evaluated, especially for sensitive applications. |
|
|
|
## Notes |
|
|
|
This was just a fun testing model, merged with the `merge.py` script in the base of the repo. Find GGUFs at [leafspark/Mistral-Large-218B-Instruct-GGUF](https://huggingface.co/leafspark/Mistral-Large-218B-Instruct-GGUF/) |
|
|
|
Compatible `mergekit` config: |
|
```yaml |
|
slices: |
|
- sources: |
|
- layer_range: [0, 20] |
|
model: mistralai/Mistral-Large-Instruct-2407 |
|
- sources: |
|
- layer_range: [10, 30] |
|
model: mistralai/Mistral-Large-Instruct-2407 |
|
- sources: |
|
- layer_range: [20, 40] |
|
model: mistralai/Mistral-Large-Instruct-2407 |
|
- sources: |
|
- layer_range: [30, 50] |
|
model: mistralai/Mistral-Large-Instruct-2407 |
|
- sources: |
|
- layer_range: [40, 60] |
|
model: mistralai/Mistral-Large-Instruct-2407 |
|
- sources: |
|
- layer_range: [50, 70] |
|
model: mistralai/Mistral-Large-Instruct-2407 |
|
- sources: |
|
- layer_range: [60, 80] |
|
model: mistralai/Mistral-Large-Instruct-2407 |
|
- sources: |
|
- layer_range: [70, 87] |
|
model: mistralai/Mistral-Large-Instruct-2407 |
|
merge_method: passthrough |
|
dtype: bfloat16 |
|
``` |