sayhan commited on
Commit
6fff31b
1 Parent(s): f6a97e8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: TencentARC/Mistral_Pro_8B_v0.1
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ license: apache-2.0
7
+ model_type: mistral
8
+ library_name: transformers
9
+ inference: false
10
+ datasets:
11
+ - HuggingFaceTB/cosmopedia
12
+ ---
13
+ ![image/webp](https://cdn-uploads.huggingface.co/production/uploads/65aa2d4b356bf23b4a4da247/NQAvp6NRHlNILyWWFlrA7.webp)
14
+ ## Mistral Pro 8B v0.1
15
+ - **Model creator:** [Google](https://huggingface.co/TencentARC)
16
+ - **Original model:** [gemma-7b-it](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1)
17
+ <!-- description start -->
18
+ ## Description
19
+ This repo contains GGUF format model files for [TencentARC's Mistral Pro 8B v0.1](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1)
20
+
21
+ ## Original model
22
+ - **Developed by:** [TencentARC](https://huggingface.co/TencentARC)
23
+
24
+ ### Description
25
+ #### Model Description
26
+ Mistral-Pro is a progressive version of the original [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) model, enhanced by the addition of Transformer blocks. It specializes in integrating both general language understanding and domain-specific knowledge, particularly in programming and mathematics.
27
+
28
+ #### Development and Training
29
+ Developed by Tencent's ARC Lab, Mistral-Pro is an 8 billion parameter model. It's an expansion of Mistral-7B, further trained on code and math corpora.
30
+
31
+ #### Intended Use
32
+ This model is designed for a wide range of NLP tasks, with a focus on programming, mathematics, and general language tasks. It suits scenarios requiring integration of natural and programming languages.
33
+
34
+ #### Performance
35
+ Mistral_Pro_8B_v0.1 showcases superior performance on a range of benchmarks. It enhances the code and math performance of Mistral. Furthermore, it matches the performance of the recently dominant model, [Gemma](https://huggingface.co/google/gemma-7b).
36
+ ##### Overall Performance on Languages, math and code tasks
37
+ | Model | ARC | Hellaswag | MMLU | TruthfulQA | Winogrande | GSM8K | HumanEval |
38
+ | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
39
+ | Gemma-7B | 61.9 | 82.2 | 64.6 | 44.8 | 79.0 | 50.9 | 32.3 |
40
+ | Mistral-7B | 60.8 | 83.3 | 62.7 | 42.6 | 78.0 | 39.2 | 28.7 |
41
+ | Mistral_Pro_8B_v0.1 | 63.2 | 82.6 | 60.6 | 48.3 | 78.9 | 50.6 | 32.9 |
42
+
43
+
44
+ #### Limitations
45
+ While Mistral-Pro addresses some limitations of previous models in the series, it may still encounter challenges specific to highly specialized domains or tasks.
46
+
47
+ #### Ethical Considerations
48
+ Users should be aware of potential biases in the model and use it responsibly, considering its impact on various applications.
49
+
50
+ ## Quantizon types
51
+ | quantization method | bits | size | description | recommended |
52
+ |---------------------|------|----------|-----------------------------------------------------|-------------|
53
+ | Q2_K | 2 | 3.36 | very small, very high quality loss | ❌ |
54
+ | Q3_K_S | 3 | 3.91 GB | very small, high quality loss | ❌ |
55
+ | Q3_K_M | 3 | 4.35 GB | small, substantial quality loss | ❌ |
56
+ | Q3_K_L | 3 | 4.74 GB | small, substantial quality loss | ❌ |
57
+ | Q4_0 | 4 | 5.09 GB | legacy; small, very high quality loss | ❌ |
58
+ | Q4_K_S | 4 | 5.13 GB | medium, balanced quality | ✅ |
59
+ | Q4_K_M | 4 | 5.42 GB | medium, balanced quality | ✅ |
60
+ | Q5_0 | 5 | 6.20 GB | legacy; medium, balanced quality | ❌ |
61
+ | Q5_K_S | 5 | 6.20 GB | large, low quality loss | ✅ |
62
+ | Q5_K_M | 5 | 6.36 GB | large, very low quality loss | ✅ |
63
+ | Q6_K | 6 | 7.37 GB | very large, extremely low quality loss | ❌ |
64
+ | Q8_0 | 8 | 9.55 GB | very large, extremely low quality loss | ❌ |
65
+ | FP16 | 16 | 18 GB | enormous, negligible quality loss | ❌ |
66
+
67
+ ## Usage
68
+ You can use this model with the latest builds of **LM Studio** and **llama.cpp**.
69
+ If you're new to the world of _large language models_, I recommend starting with **LM Studio**.
70
+ <!-- description end -->