Update README.md

fd93987 verified 8 months ago

4.12 kB

	---
	base_model: TencentARC/Mistral_Pro_8B_v0.1
	language:
	- en
	pipeline_tag: text-generation
	license: apache-2.0
	model_type: mistral
	library_name: transformers
	inference: false
	datasets:
	- HuggingFaceTB/cosmopedia
	---
	## Mistral Pro 8B v0.1
	- Model creator: [TencentARC](https://huggingface.co/TencentARC)
	- Original model: [Mistral_Pro_8B_v0.1-7b-it](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1)
	<!-- description start -->
	## Description
	This repo contains GGUF format model files for [TencentARC's Mistral Pro 8B v0.1](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1)

	## Original model
	- Developed by: [TencentARC](https://huggingface.co/TencentARC)

	### Description
	#### Model Description
	Mistral-Pro is a progressive version of the original [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) model, enhanced by the addition of Transformer blocks. It specializes in integrating both general language understanding and domain-specific knowledge, particularly in programming and mathematics.

	#### Development and Training
	Developed by Tencent's ARC Lab, Mistral-Pro is an 8 billion parameter model. It's an expansion of Mistral-7B, further trained on code and math corpora.

	#### Intended Use
	This model is designed for a wide range of NLP tasks, with a focus on programming, mathematics, and general language tasks. It suits scenarios requiring integration of natural and programming languages.

	#### Performance
	Mistral_Pro_8B_v0.1 showcases superior performance on a range of benchmarks. It enhances the code and math performance of Mistral. Furthermore, it matches the performance of the recently dominant model, [Gemma](https://huggingface.co/google/gemma-7b).
	##### Overall Performance on Languages, math and code tasks
	\| Model \| ARC \| Hellaswag \| MMLU \| TruthfulQA \| Winogrande \| GSM8K \| HumanEval \|
	\| :-: \| :-: \| :-: \| :-: \| :-: \| :-: \| :-: \| :-: \|
	\| Gemma-7B \| 61.9 \| 82.2 \| 64.6 \| 44.8 \| 79.0 \| 50.9 \| 32.3 \|
	\| Mistral-7B \| 60.8 \| 83.3 \| 62.7 \| 42.6 \| 78.0 \| 39.2 \| 28.7 \|
	\| Mistral_Pro_8B_v0.1 \| 63.2 \| 82.6 \| 60.6 \| 48.3 \| 78.9 \| 50.6 \| 32.9 \|


	#### Limitations
	While Mistral-Pro addresses some limitations of previous models in the series, it may still encounter challenges specific to highly specialized domains or tasks.

	#### Ethical Considerations
	Users should be aware of potential biases in the model and use it responsibly, considering its impact on various applications.

	## Quantizon types
	\| quantization method \| bits \| size \| description \| recommended \|
	\|---------------------\|------\|----------\|-----------------------------------------------------\|-------------\|
	\| Q2_K \| 2 \| 3.36 \| very small, very high quality loss \| ❌ \|
	\| Q3_K_S \| 3 \| 3.91 GB \| very small, high quality loss \| ❌ \|
	\| Q3_K_M \| 3 \| 4.35 GB \| small, substantial quality loss \| ❌ \|
	\| Q3_K_L \| 3 \| 4.74 GB \| small, substantial quality loss \| ❌ \|
	\| Q4_0 \| 4 \| 5.09 GB \| legacy; small, very high quality loss \| ❌ \|
	\| Q4_K_S \| 4 \| 5.13 GB \| medium, balanced quality \| ✅ \|
	\| Q4_K_M \| 4 \| 5.42 GB \| medium, balanced quality \| ✅ \|
	\| Q5_0 \| 5 \| 6.20 GB \| legacy; medium, balanced quality \| ❌ \|
	\| Q5_K_S \| 5 \| 6.20 GB \| large, low quality loss \| ✅ \|
	\| Q5_K_M \| 5 \| 6.36 GB \| large, very low quality loss \| ✅ \|
	\| Q6_K \| 6 \| 7.37 GB \| very large, extremely low quality loss \| ❌ \|
	\| Q8_0 \| 8 \| 9.55 GB \| very large, extremely low quality loss \| ❌ \|
	\| FP16 \| 16 \| 18 GB \| enormous, negligible quality loss \| ❌ \|

	## Usage
	You can use this model with the latest builds of LM Studio and llama.cpp.
	If you're new to the world of _large language models_, I recommend starting with LM Studio.
	<!-- description end -->