Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: TencentARC/Mistral_Pro_8B_v0.1
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
pipeline_tag: text-generation
|
6 |
+
license: apache-2.0
|
7 |
+
model_type: mistral
|
8 |
+
library_name: transformers
|
9 |
+
inference: false
|
10 |
+
datasets:
|
11 |
+
- HuggingFaceTB/cosmopedia
|
12 |
+
---
|
13 |
+
![image/webp](https://cdn-uploads.huggingface.co/production/uploads/65aa2d4b356bf23b4a4da247/NQAvp6NRHlNILyWWFlrA7.webp)
|
14 |
+
## Mistral Pro 8B v0.1
|
15 |
+
- **Model creator:** [Google](https://huggingface.co/TencentARC)
|
16 |
+
- **Original model:** [gemma-7b-it](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1)
|
17 |
+
<!-- description start -->
|
18 |
+
## Description
|
19 |
+
This repo contains GGUF format model files for [TencentARC's Mistral Pro 8B v0.1](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1)
|
20 |
+
|
21 |
+
## Original model
|
22 |
+
- **Developed by:** [TencentARC](https://huggingface.co/TencentARC)
|
23 |
+
|
24 |
+
### Description
|
25 |
+
#### Model Description
|
26 |
+
Mistral-Pro is a progressive version of the original [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) model, enhanced by the addition of Transformer blocks. It specializes in integrating both general language understanding and domain-specific knowledge, particularly in programming and mathematics.
|
27 |
+
|
28 |
+
#### Development and Training
|
29 |
+
Developed by Tencent's ARC Lab, Mistral-Pro is an 8 billion parameter model. It's an expansion of Mistral-7B, further trained on code and math corpora.
|
30 |
+
|
31 |
+
#### Intended Use
|
32 |
+
This model is designed for a wide range of NLP tasks, with a focus on programming, mathematics, and general language tasks. It suits scenarios requiring integration of natural and programming languages.
|
33 |
+
|
34 |
+
#### Performance
|
35 |
+
Mistral_Pro_8B_v0.1 showcases superior performance on a range of benchmarks. It enhances the code and math performance of Mistral. Furthermore, it matches the performance of the recently dominant model, [Gemma](https://huggingface.co/google/gemma-7b).
|
36 |
+
##### Overall Performance on Languages, math and code tasks
|
37 |
+
| Model | ARC | Hellaswag | MMLU | TruthfulQA | Winogrande | GSM8K | HumanEval |
|
38 |
+
| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
|
39 |
+
| Gemma-7B | 61.9 | 82.2 | 64.6 | 44.8 | 79.0 | 50.9 | 32.3 |
|
40 |
+
| Mistral-7B | 60.8 | 83.3 | 62.7 | 42.6 | 78.0 | 39.2 | 28.7 |
|
41 |
+
| Mistral_Pro_8B_v0.1 | 63.2 | 82.6 | 60.6 | 48.3 | 78.9 | 50.6 | 32.9 |
|
42 |
+
|
43 |
+
|
44 |
+
#### Limitations
|
45 |
+
While Mistral-Pro addresses some limitations of previous models in the series, it may still encounter challenges specific to highly specialized domains or tasks.
|
46 |
+
|
47 |
+
#### Ethical Considerations
|
48 |
+
Users should be aware of potential biases in the model and use it responsibly, considering its impact on various applications.
|
49 |
+
|
50 |
+
## Quantizon types
|
51 |
+
| quantization method | bits | size | description | recommended |
|
52 |
+
|---------------------|------|----------|-----------------------------------------------------|-------------|
|
53 |
+
| Q2_K | 2 | 3.36 | very small, very high quality loss | ❌ |
|
54 |
+
| Q3_K_S | 3 | 3.91 GB | very small, high quality loss | ❌ |
|
55 |
+
| Q3_K_M | 3 | 4.35 GB | small, substantial quality loss | ❌ |
|
56 |
+
| Q3_K_L | 3 | 4.74 GB | small, substantial quality loss | ❌ |
|
57 |
+
| Q4_0 | 4 | 5.09 GB | legacy; small, very high quality loss | ❌ |
|
58 |
+
| Q4_K_S | 4 | 5.13 GB | medium, balanced quality | ✅ |
|
59 |
+
| Q4_K_M | 4 | 5.42 GB | medium, balanced quality | ✅ |
|
60 |
+
| Q5_0 | 5 | 6.20 GB | legacy; medium, balanced quality | ❌ |
|
61 |
+
| Q5_K_S | 5 | 6.20 GB | large, low quality loss | ✅ |
|
62 |
+
| Q5_K_M | 5 | 6.36 GB | large, very low quality loss | ✅ |
|
63 |
+
| Q6_K | 6 | 7.37 GB | very large, extremely low quality loss | ❌ |
|
64 |
+
| Q8_0 | 8 | 9.55 GB | very large, extremely low quality loss | ❌ |
|
65 |
+
| FP16 | 16 | 18 GB | enormous, negligible quality loss | ❌ |
|
66 |
+
|
67 |
+
## Usage
|
68 |
+
You can use this model with the latest builds of **LM Studio** and **llama.cpp**.
|
69 |
+
If you're new to the world of _large language models_, I recommend starting with **LM Studio**.
|
70 |
+
<!-- description end -->
|