|
--- |
|
license: apache-2.0 |
|
license_link: https://github.com/mistralai/mistral-common/blob/main/LICENCE |
|
library: llama.cpp |
|
library_link: https://github.com/ggerganov/llama.cpp |
|
base_model: |
|
- mistralai/Mixtral-8x7B-v0.1 |
|
language: |
|
- fr |
|
- it |
|
- de |
|
- es |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- nlp |
|
- code |
|
- gguf |
|
- sparse |
|
- mixture-of-experts |
|
- code-generation |
|
--- |
|
|
|
## Mixtral 8x7B Instruct v0.1 |
|
|
|
### Quantized Model Files |
|
|
|
The Mixtral 8x7B Sparse Mixture of Experts (SMoE) model is available in two formats: |
|
|
|
- **ggml-model-q4_0.gguf**: 4-bit quantization for reduced memory and compute overhead. |
|
- **ggml-model-q8_0.gguf**: 8-bit quantization, offering balanced performance and precision. |
|
|
|
These quantized formats ensure flexibility for deployment on various hardware configurations, from lightweight devices to large-scale inference servers. |
|
|
|
### Model Information |
|
|
|
Mixtral 8x7B is a generative Sparse Mixture of Experts (SMoE) model designed to deliver high-quality outputs with significant computational efficiency. Leveraging a routing mechanism, it dynamically activates a subset of experts per input, reducing computational costs while maintaining the performance of a much larger model. |
|
|
|
**Key Features:** |
|
|
|
- **Architecture:** Decoder-only SMoE with 46.7B total parameters but only 12.9B parameters active per token. |
|
- **Context Window:** Supports up to 32k tokens, making it suitable for long-context applications. |
|
- **Multilingual Capabilities:** Trained on French, Italian, German, Spanish, and English, making it robust for diverse linguistic tasks. |
|
- **Performance:** Matches or exceeds Llama 2 70B and GPT-3.5 across several industry-standard benchmarks. |
|
- **Fine-Tuning Potential:** Optimized for instruction-following use cases, with finetuning yielding strong improvements in dialogue and safety alignment. |
|
|
|
**Developer**: Mistral AI |
|
**Training Data**: Open web data, curated for quality and diverse representation. |
|
**Application Areas**: Code generation, multilingual dialogue, and long-context processing. |
|
|
|
### Core Library |
|
|
|
Mixtral 8x7B Instruct is supported by multiple libraries to ensure flexibility for deployment and development. The primary frameworks include: |
|
|
|
- **Primary Framework**: `llama.cpp` |
|
- **Alternate Frameworks**: |
|
- `transformers` for initial integration into Hugging Face's ecosystem. |
|
- `vLLM` for highly optimized inference with low-latency serving. |
|
|
|
You can access the model components and libraries here: |
|
|
|
- **Model Base**: [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) |
|
- **Common Utilities**: [mistralai/mistral-common](https://github.com/mistralai/mistral-common) |
|
- **Inference Optimization**: [mistralai/mistral-inference](https://github.com/mistralai/mistral-inference) |
|
- **Quantization Support**: [ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp) |
|
|
|
These resources provide a complete ecosystem for deployment, fine-tuning, and scaling sparse mixture models. |
|
|
|
### Safety and Responsible Use |
|
|
|
Mixtral 8x7B has been trained with an emphasis on ethical use and safety. It includes: |
|
|
|
1. **Guardrails for Sensitive Content**: Optional system prompts to guide outputs. |
|
2. **Self-Reflection Prompting**: Mechanism for internal assessment of generated outputs, allowing the model to classify its responses as suitable or unsuitable for deployment. |
|
|
|
Developers should always consider additional tuning or filtering depending on their application and context. |
|
|