teleprint-me
/

mixtral-8x7b-instruct-v0.1

Text Generation

mixture-of-experts

code-generation

Inference Endpoints

Model card Files Files and versions Community

mixtral-8x7b-instruct-v0.1 / README.md

aberrio's picture

Update README.md

adc125a verified 3 months ago

|

3.48 kB

	---
	license: apache-2.0
	license_link: https://github.com/mistralai/mistral-common/blob/main/LICENCE
	library: llama.cpp
	library_link: https://github.com/ggerganov/llama.cpp
	base_model:
	- mistralai/Mixtral-8x7B-v0.1
	language:
	- fr
	- it
	- de
	- es
	- en
	pipeline_tag: text-generation
	tags:
	- nlp
	- code
	- gguf
	- sparse
	- mixture-of-experts
	- code-generation
	---

	## Mixtral 8x7B Instruct v0.1

	### Quantized Model Files

	The Mixtral 8x7B Sparse Mixture of Experts (SMoE) model is available in two formats:

	- ggml-model-q4_0.gguf: 4-bit quantization for reduced memory and compute overhead.
	- ggml-model-q8_0.gguf: 8-bit quantization, offering balanced performance and precision.

	These quantized formats ensure flexibility for deployment on various hardware configurations, from lightweight devices to large-scale inference servers.

	### Model Information

	Mixtral 8x7B is a generative Sparse Mixture of Experts (SMoE) model designed to deliver high-quality outputs with significant computational efficiency. Leveraging a routing mechanism, it dynamically activates a subset of experts per input, reducing computational costs while maintaining the performance of a much larger model.

	Key Features:

	- Architecture: Decoder-only SMoE with 46.7B total parameters but only 12.9B parameters active per token.
	- Context Window: Supports up to 32k tokens, making it suitable for long-context applications.
	- Multilingual Capabilities: Trained on French, Italian, German, Spanish, and English, making it robust for diverse linguistic tasks.
	- Performance: Matches or exceeds Llama 2 70B and GPT-3.5 across several industry-standard benchmarks.
	- Fine-Tuning Potential: Optimized for instruction-following use cases, with finetuning yielding strong improvements in dialogue and safety alignment.

	Developer: Mistral AI
	Training Data: Open web data, curated for quality and diverse representation.
	Application Areas: Code generation, multilingual dialogue, and long-context processing.

	### Core Library

	Mixtral 8x7B Instruct is supported by multiple libraries to ensure flexibility for deployment and development. The primary frameworks include:

	- Primary Framework: `llama.cpp`
	- Alternate Frameworks:
	- `transformers` for initial integration into Hugging Face's ecosystem.
	- `vLLM` for highly optimized inference with low-latency serving.

	You can access the model components and libraries here:

	- Model Base: [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)
	- Common Utilities: [mistralai/mistral-common](https://github.com/mistralai/mistral-common)
	- Inference Optimization: [mistralai/mistral-inference](https://github.com/mistralai/mistral-inference)
	- Quantization Support: [ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp)

	These resources provide a complete ecosystem for deployment, fine-tuning, and scaling sparse mixture models.

	### Safety and Responsible Use

	Mixtral 8x7B has been trained with an emphasis on ethical use and safety. It includes:

	1. Guardrails for Sensitive Content: Optional system prompts to guide outputs.
	2. Self-Reflection Prompting: Mechanism for internal assessment of generated outputs, allowing the model to classify its responses as suitable or unsuitable for deployment.

	Developers should always consider additional tuning or filtering depending on their application and context.