leafspark's picture
readme: add model card
b2aed30 verified
|
raw
history blame
4.23 kB
---
license: other
license_name: mrl
license_link: https://mistral.ai/licenses/MRL-0.1.md
language:
- en
- fr
- de
- es
- it
- pt
- zh
- ja
- ru
- ko
---
# Mistral-Large-218B-Instruct
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/P-BGJ5Ba2d1NkpdGXNThe.png)
Mistral-Large-218B-Instruct is an advanced dense Large Language Model (LLM) with 218 billion parameters, featuring state-of-the-art reasoning, knowledge, and coding capabilities.
Self-merged from the original Mistral Large 2, see mergekit config below.
## Key features
- Massive scale: With 218 billion parameters, this model pushes the boundaries of language model capabilities.
- Multi-lingual by design: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
- Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, JavaScript, and Bash, as well as more specific languages like Swift and Fortran.
- Agentic-centric: Best-in-class agentic capabilities with native function calling and JSON outputting.
- Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities.
- Mistral Research License: Allows usage and modification for research and non-commercial purposes.
- Large Context: Features a large 128k context window for handling extensive input.
## Metrics
Note: The following metrics are based on the original model and may differ for this 218B parameter version. Updated benchmarks will be provided when available.
**Base Pretrained Benchmarks**
| Benchmark | Score |
| --- | --- |
| MMLU | 84.0% |
**Base Pretrained Multilingual Benchmarks (MMLU)**
| Benchmark | Score |
| --- | --- |
| French | 82.8% |
| German | 81.6% |
| Spanish | 82.7% |
| Italian | 82.7% |
| Dutch | 80.7% |
| Portuguese | 81.6% |
| Russian | 79.0% |
| Korean | 60.1% |
| Japanese | 78.8% |
| Chinese | 74.8% |
**Instruction Benchmarks**
| Benchmark | Score |
| --- | --- |
| MT Bench | 8.63 |
| Wild Bench | 56.3 |
| Arena Hard| 73.2 |
**Code & Reasoning Benchmarks**
| Benchmark | Score |
| --- | --- |
| Human Eval | 92% |
| Human Eval Plus| 87% |
| MBPP Base| 80% |
| MBPP Plus| 69% |
**Math Benchmarks**
| Benchmark | Score |
| --- | --- |
| GSM8K | 93% |
| Math Instruct (0-shot, no CoT) | 70% |
| Math Instruct (0-shot, CoT)| 71.5% |
## Usage
This model can be used with standard LLM frameworks and libraries. Specific usage instructions will be provided upon release.
## Hardware Requirements
Given the size of this model (218B parameters), it requires substantial computational resources for inference:
- Recommended: 8xH100 (640GB)
- Alternatively: Distributed inference setup across multiple machines.
## Limitations
- This model does not have built-in moderation mechanisms. Users should implement appropriate safeguards for deployment in production environments.
- Due to its size, inference may be computationally expensive and require significant hardware resources.
- As with all large language models, it may exhibit biases present in its training data.
- The model's outputs should be critically evaluated, especially for sensitive applications.
## Notes
This was just a fun testing model, merged with the `merge.py` script in the base of the repo. Find GGUFs at [leafspark/Mistral-Large-218B-Instruct-GGUF](https://huggingface.co/leafspark/Mistral-Large-218B-Instruct-GGUF/)
Compatible `mergekit` config:
```yaml
slices:
- sources:
- layer_range: [0, 20]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [10, 30]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [20, 40]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [30, 50]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [40, 60]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [50, 70]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [60, 80]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [70, 87]
model: mistralai/Mistral-Large-Instruct-2407
merge_method: passthrough
dtype: bfloat16
```