readme: add model card

b2aed30 verified 5 months ago

4.23 kB

	---
	license: other
	license_name: mrl
	license_link: https://mistral.ai/licenses/MRL-0.1.md
	language:
	- en
	- fr
	- de
	- es
	- it
	- pt
	- zh
	- ja
	- ru
	- ko
	---

	# Mistral-Large-218B-Instruct

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/P-BGJ5Ba2d1NkpdGXNThe.png)

	Mistral-Large-218B-Instruct is an advanced dense Large Language Model (LLM) with 218 billion parameters, featuring state-of-the-art reasoning, knowledge, and coding capabilities.

	Self-merged from the original Mistral Large 2, see mergekit config below.

	## Key features
	- Massive scale: With 218 billion parameters, this model pushes the boundaries of language model capabilities.
	- Multi-lingual by design: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
	- Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, JavaScript, and Bash, as well as more specific languages like Swift and Fortran.
	- Agentic-centric: Best-in-class agentic capabilities with native function calling and JSON outputting.
	- Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities.
	- Mistral Research License: Allows usage and modification for research and non-commercial purposes.
	- Large Context: Features a large 128k context window for handling extensive input.

	## Metrics

	Note: The following metrics are based on the original model and may differ for this 218B parameter version. Updated benchmarks will be provided when available.

	Base Pretrained Benchmarks

	\| Benchmark \| Score \|
	\| --- \| --- \|
	\| MMLU \| 84.0% \|

	Base Pretrained Multilingual Benchmarks (MMLU)
	\| Benchmark \| Score \|
	\| --- \| --- \|
	\| French \| 82.8% \|
	\| German \| 81.6% \|
	\| Spanish \| 82.7% \|
	\| Italian \| 82.7% \|
	\| Dutch \| 80.7% \|
	\| Portuguese \| 81.6% \|
	\| Russian \| 79.0% \|
	\| Korean \| 60.1% \|
	\| Japanese \| 78.8% \|
	\| Chinese \| 74.8% \|

	Instruction Benchmarks

	\| Benchmark \| Score \|
	\| --- \| --- \|
	\| MT Bench \| 8.63 \|
	\| Wild Bench \| 56.3 \|
	\| Arena Hard\| 73.2 \|

	Code & Reasoning Benchmarks
	\| Benchmark \| Score \|
	\| --- \| --- \|
	\| Human Eval \| 92% \|
	\| Human Eval Plus\| 87% \|
	\| MBPP Base\| 80% \|
	\| MBPP Plus\| 69% \|

	Math Benchmarks

	\| Benchmark \| Score \|
	\| --- \| --- \|
	\| GSM8K \| 93% \|
	\| Math Instruct (0-shot, no CoT) \| 70% \|
	\| Math Instruct (0-shot, CoT)\| 71.5% \|

	## Usage

	This model can be used with standard LLM frameworks and libraries. Specific usage instructions will be provided upon release.

	## Hardware Requirements

	Given the size of this model (218B parameters), it requires substantial computational resources for inference:
	- Recommended: 8xH100 (640GB)
	- Alternatively: Distributed inference setup across multiple machines.

	## Limitations

	- This model does not have built-in moderation mechanisms. Users should implement appropriate safeguards for deployment in production environments.
	- Due to its size, inference may be computationally expensive and require significant hardware resources.
	- As with all large language models, it may exhibit biases present in its training data.
	- The model's outputs should be critically evaluated, especially for sensitive applications.

	## Notes

	This was just a fun testing model, merged with the `merge.py` script in the base of the repo. Find GGUFs at [leafspark/Mistral-Large-218B-Instruct-GGUF](https://huggingface.co/leafspark/Mistral-Large-218B-Instruct-GGUF/)

	Compatible `mergekit` config:
	```yaml
	slices:
	- sources:
	- layer_range: [0, 20]
	model: mistralai/Mistral-Large-Instruct-2407
	- sources:
	- layer_range: [10, 30]
	model: mistralai/Mistral-Large-Instruct-2407
	- sources:
	- layer_range: [20, 40]
	model: mistralai/Mistral-Large-Instruct-2407
	- sources:
	- layer_range: [30, 50]
	model: mistralai/Mistral-Large-Instruct-2407
	- sources:
	- layer_range: [40, 60]
	model: mistralai/Mistral-Large-Instruct-2407
	- sources:
	- layer_range: [50, 70]
	model: mistralai/Mistral-Large-Instruct-2407
	- sources:
	- layer_range: [60, 80]
	model: mistralai/Mistral-Large-Instruct-2407
	- sources:
	- layer_range: [70, 87]
	model: mistralai/Mistral-Large-Instruct-2407
	merge_method: passthrough
	dtype: bfloat16
	```