aberrio's picture
Update README.md
adc125a verified
|
raw
history blame
3.48 kB
metadata
license: apache-2.0
license_link: https://github.com/mistralai/mistral-common/blob/main/LICENCE
library: llama.cpp
library_link: https://github.com/ggerganov/llama.cpp
base_model:
  - mistralai/Mixtral-8x7B-v0.1
language:
  - fr
  - it
  - de
  - es
  - en
pipeline_tag: text-generation
tags:
  - nlp
  - code
  - gguf
  - sparse
  - mixture-of-experts
  - code-generation

Mixtral 8x7B Instruct v0.1

Quantized Model Files

The Mixtral 8x7B Sparse Mixture of Experts (SMoE) model is available in two formats:

  • ggml-model-q4_0.gguf: 4-bit quantization for reduced memory and compute overhead.
  • ggml-model-q8_0.gguf: 8-bit quantization, offering balanced performance and precision.

These quantized formats ensure flexibility for deployment on various hardware configurations, from lightweight devices to large-scale inference servers.

Model Information

Mixtral 8x7B is a generative Sparse Mixture of Experts (SMoE) model designed to deliver high-quality outputs with significant computational efficiency. Leveraging a routing mechanism, it dynamically activates a subset of experts per input, reducing computational costs while maintaining the performance of a much larger model.

Key Features:

  • Architecture: Decoder-only SMoE with 46.7B total parameters but only 12.9B parameters active per token.
  • Context Window: Supports up to 32k tokens, making it suitable for long-context applications.
  • Multilingual Capabilities: Trained on French, Italian, German, Spanish, and English, making it robust for diverse linguistic tasks.
  • Performance: Matches or exceeds Llama 2 70B and GPT-3.5 across several industry-standard benchmarks.
  • Fine-Tuning Potential: Optimized for instruction-following use cases, with finetuning yielding strong improvements in dialogue and safety alignment.

Developer: Mistral AI
Training Data: Open web data, curated for quality and diverse representation.
Application Areas: Code generation, multilingual dialogue, and long-context processing.

Core Library

Mixtral 8x7B Instruct is supported by multiple libraries to ensure flexibility for deployment and development. The primary frameworks include:

  • Primary Framework: llama.cpp
  • Alternate Frameworks:
    • transformers for initial integration into Hugging Face's ecosystem.
    • vLLM for highly optimized inference with low-latency serving.

You can access the model components and libraries here:

These resources provide a complete ecosystem for deployment, fine-tuning, and scaling sparse mixture models.

Safety and Responsible Use

Mixtral 8x7B has been trained with an emphasis on ethical use and safety. It includes:

  1. Guardrails for Sensitive Content: Optional system prompts to guide outputs.
  2. Self-Reflection Prompting: Mechanism for internal assessment of generated outputs, allowing the model to classify its responses as suitable or unsuitable for deployment.

Developers should always consider additional tuning or filtering depending on their application and context.