|
--- |
|
base_model: TencentARC/Mistral_Pro_8B_v0.1 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
license: apache-2.0 |
|
model_type: mistral |
|
library_name: transformers |
|
inference: false |
|
datasets: |
|
- HuggingFaceTB/cosmopedia |
|
--- |
|
## Mistral Pro 8B v0.1 |
|
- **Model creator:** [TencentARC](https://huggingface.co/TencentARC) |
|
- **Original model:** [Mistral_Pro_8B_v0.1-7b-it](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1) |
|
<!-- description start --> |
|
## Description |
|
This repo contains GGUF format model files for [TencentARC's Mistral Pro 8B v0.1](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1) |
|
|
|
## Original model |
|
- **Developed by:** [TencentARC](https://huggingface.co/TencentARC) |
|
|
|
### Description |
|
#### Model Description |
|
Mistral-Pro is a progressive version of the original [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) model, enhanced by the addition of Transformer blocks. It specializes in integrating both general language understanding and domain-specific knowledge, particularly in programming and mathematics. |
|
|
|
#### Development and Training |
|
Developed by Tencent's ARC Lab, Mistral-Pro is an 8 billion parameter model. It's an expansion of Mistral-7B, further trained on code and math corpora. |
|
|
|
#### Intended Use |
|
This model is designed for a wide range of NLP tasks, with a focus on programming, mathematics, and general language tasks. It suits scenarios requiring integration of natural and programming languages. |
|
|
|
#### Performance |
|
Mistral_Pro_8B_v0.1 showcases superior performance on a range of benchmarks. It enhances the code and math performance of Mistral. Furthermore, it matches the performance of the recently dominant model, [Gemma](https://huggingface.co/google/gemma-7b). |
|
##### Overall Performance on Languages, math and code tasks |
|
| Model | ARC | Hellaswag | MMLU | TruthfulQA | Winogrande | GSM8K | HumanEval | |
|
| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | |
|
| Gemma-7B | 61.9 | 82.2 | 64.6 | 44.8 | 79.0 | 50.9 | 32.3 | |
|
| Mistral-7B | 60.8 | 83.3 | 62.7 | 42.6 | 78.0 | 39.2 | 28.7 | |
|
| Mistral_Pro_8B_v0.1 | 63.2 | 82.6 | 60.6 | 48.3 | 78.9 | 50.6 | 32.9 | |
|
|
|
|
|
#### Limitations |
|
While Mistral-Pro addresses some limitations of previous models in the series, it may still encounter challenges specific to highly specialized domains or tasks. |
|
|
|
#### Ethical Considerations |
|
Users should be aware of potential biases in the model and use it responsibly, considering its impact on various applications. |
|
|
|
## Quantizon types |
|
| quantization method | bits | size | description | recommended | |
|
|---------------------|------|----------|-----------------------------------------------------|-------------| |
|
| Q2_K | 2 | 3.36 | very small, very high quality loss | β | |
|
| Q3_K_S | 3 | 3.91 GB | very small, high quality loss | β | |
|
| Q3_K_M | 3 | 4.35 GB | small, substantial quality loss | β | |
|
| Q3_K_L | 3 | 4.74 GB | small, substantial quality loss | β | |
|
| Q4_0 | 4 | 5.09 GB | legacy; small, very high quality loss | β | |
|
| Q4_K_S | 4 | 5.13 GB | medium, balanced quality | β
| |
|
| Q4_K_M | 4 | 5.42 GB | medium, balanced quality | β
| |
|
| Q5_0 | 5 | 6.20 GB | legacy; medium, balanced quality | β | |
|
| Q5_K_S | 5 | 6.20 GB | large, low quality loss | β
| |
|
| Q5_K_M | 5 | 6.36 GB | large, very low quality loss | β
| |
|
| Q6_K | 6 | 7.37 GB | very large, extremely low quality loss | β | |
|
| Q8_0 | 8 | 9.55 GB | very large, extremely low quality loss | β | |
|
| FP16 | 16 | 18 GB | enormous, negligible quality loss | β | |
|
|
|
## Usage |
|
You can use this model with the latest builds of **LM Studio** and **llama.cpp**. |
|
If you're new to the world of _large language models_, I recommend starting with **LM Studio**. |
|
<!-- description end --> |