File size: 4,122 Bytes
6fff31b fd93987 6fff31b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
---
base_model: TencentARC/Mistral_Pro_8B_v0.1
language:
- en
pipeline_tag: text-generation
license: apache-2.0
model_type: mistral
library_name: transformers
inference: false
datasets:
- HuggingFaceTB/cosmopedia
---
## Mistral Pro 8B v0.1
- **Model creator:** [TencentARC](https://huggingface.co/TencentARC)
- **Original model:** [Mistral_Pro_8B_v0.1-7b-it](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1)
<!-- description start -->
## Description
This repo contains GGUF format model files for [TencentARC's Mistral Pro 8B v0.1](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1)
## Original model
- **Developed by:** [TencentARC](https://huggingface.co/TencentARC)
### Description
#### Model Description
Mistral-Pro is a progressive version of the original [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) model, enhanced by the addition of Transformer blocks. It specializes in integrating both general language understanding and domain-specific knowledge, particularly in programming and mathematics.
#### Development and Training
Developed by Tencent's ARC Lab, Mistral-Pro is an 8 billion parameter model. It's an expansion of Mistral-7B, further trained on code and math corpora.
#### Intended Use
This model is designed for a wide range of NLP tasks, with a focus on programming, mathematics, and general language tasks. It suits scenarios requiring integration of natural and programming languages.
#### Performance
Mistral_Pro_8B_v0.1 showcases superior performance on a range of benchmarks. It enhances the code and math performance of Mistral. Furthermore, it matches the performance of the recently dominant model, [Gemma](https://huggingface.co/google/gemma-7b).
##### Overall Performance on Languages, math and code tasks
| Model | ARC | Hellaswag | MMLU | TruthfulQA | Winogrande | GSM8K | HumanEval |
| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
| Gemma-7B | 61.9 | 82.2 | 64.6 | 44.8 | 79.0 | 50.9 | 32.3 |
| Mistral-7B | 60.8 | 83.3 | 62.7 | 42.6 | 78.0 | 39.2 | 28.7 |
| Mistral_Pro_8B_v0.1 | 63.2 | 82.6 | 60.6 | 48.3 | 78.9 | 50.6 | 32.9 |
#### Limitations
While Mistral-Pro addresses some limitations of previous models in the series, it may still encounter challenges specific to highly specialized domains or tasks.
#### Ethical Considerations
Users should be aware of potential biases in the model and use it responsibly, considering its impact on various applications.
## Quantizon types
| quantization method | bits | size | description | recommended |
|---------------------|------|----------|-----------------------------------------------------|-------------|
| Q2_K | 2 | 3.36 | very small, very high quality loss | ❌ |
| Q3_K_S | 3 | 3.91 GB | very small, high quality loss | ❌ |
| Q3_K_M | 3 | 4.35 GB | small, substantial quality loss | ❌ |
| Q3_K_L | 3 | 4.74 GB | small, substantial quality loss | ❌ |
| Q4_0 | 4 | 5.09 GB | legacy; small, very high quality loss | ❌ |
| Q4_K_S | 4 | 5.13 GB | medium, balanced quality | ✅ |
| Q4_K_M | 4 | 5.42 GB | medium, balanced quality | ✅ |
| Q5_0 | 5 | 6.20 GB | legacy; medium, balanced quality | ❌ |
| Q5_K_S | 5 | 6.20 GB | large, low quality loss | ✅ |
| Q5_K_M | 5 | 6.36 GB | large, very low quality loss | ✅ |
| Q6_K | 6 | 7.37 GB | very large, extremely low quality loss | ❌ |
| Q8_0 | 8 | 9.55 GB | very large, extremely low quality loss | ❌ |
| FP16 | 16 | 18 GB | enormous, negligible quality loss | ❌ |
## Usage
You can use this model with the latest builds of **LM Studio** and **llama.cpp**.
If you're new to the world of _large language models_, I recommend starting with **LM Studio**.
<!-- description end --> |