|
--- |
|
pipeline_tag: text-generation |
|
tags: |
|
- qwen |
|
- qwen-2 |
|
- quantized |
|
- 2-bit |
|
- 3-bit |
|
- 4-bit |
|
- 5-bit |
|
- 6-bit |
|
- 8-bit |
|
- 16-bit |
|
- GGUF |
|
inference: false |
|
model_creator: MaziyarPanahi |
|
model_name: Qwen2-72B-Instruct-v0.1-GGUF |
|
quantized_by: MaziyarPanahi |
|
license: other |
|
license_name: tongyi-qianwen |
|
license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE |
|
--- |
|
|
|
|
|
# MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF |
|
|
|
The GGUF and quantized models here are based on [MaziyarPanahi/Qwen2-72B-Instruct-v0.1](https://huggingface.co/MaziyarPanahi/Qwen2-72B-Instruct-v0.1) model |
|
|
|
## How to download |
|
You can download only the quants you need instead of cloning the entire repository as follows: |
|
|
|
``` |
|
huggingface-cli download MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF --local-dir . --include '*Q2_K*gguf' |
|
``` |
|
|
|
## Load GGUF models |
|
|
|
You `MUST` follow the prompt template provided by Llama-3: |
|
|
|
|
|
```sh |
|
./llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q2_K.gguf -p "<|im_start|>user\nJust say 1, 2, 3 hi and NOTHING else\n<|im_end|>\n<|im_start|>assistant\n" -n 1024 |
|
``` |
|
|
|
|
|
|
|
|
|
## Original README |
|
|
|
--- |
|
|
|
# MaziyarPanahi/Qwen2-72B-Instruct-v0.1 |
|
|
|
This is a fine-tuned version of the `Qwen/Qwen2-72B-Instruct` model. It aims to improve the base model across all benchmarks. |
|
|
|
# ⚡ Quantized GGUF |
|
|
|
All GGUF models are available here: [MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF](https://huggingface.co/MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF) |
|
|
|
# 🏆 [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
|
|
|
|
|
|
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr| |
|
|--------------|------:|------|-----:|------|-----:|---|-----:| |
|
|truthfulqa_mc2| 2|none | 0|acc |0.6761|± |0.0148| |
|
|
|
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr| |
|
|----------|------:|------|-----:|------|-----:|---|-----:| |
|
|winogrande| 1|none | 5|acc |0.8248|± |0.0107| |
|
|
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |
|
|-------------|------:|------|-----:|--------|-----:|---|-----:| |
|
|arc_challenge| 1|none | 25|acc |0.6852|± |0.0136| |
|
| | |none | 25|acc_norm|0.7184|± |0.0131| |
|
|
|
|Tasks|Version| Filter |n-shot| Metric |Value | |Stderr| |
|
|-----|------:|----------------|-----:|-----------|-----:|---|-----:| |
|
|gsm8k| 3|strict-match | 5|exact_match|0.8582|± |0.0096| |
|
| | |flexible-extract| 5|exact_match|0.8893|± |0.0086| |
|
|
|
# Prompt Template |
|
|
|
This model uses `ChatML` prompt template: |
|
|
|
``` |
|
<|im_start|>system |
|
{System} |
|
<|im_end|> |
|
<|im_start|>user |
|
{User} |
|
<|im_end|> |
|
<|im_start|>assistant |
|
{Assistant} |
|
```` |
|
|
|
# How to use |
|
|
|
|
|
```python |
|
|
|
# Use a pipeline as a high-level helper |
|
|
|
from transformers import pipeline |
|
|
|
messages = [ |
|
{"role": "user", "content": "Who are you?"}, |
|
] |
|
pipe = pipeline("text-generation", model="MaziyarPanahi/Qwen2-72B-Instruct-v0.1") |
|
pipe(messages) |
|
|
|
|
|
# Load model directly |
|
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Qwen2-72B-Instruct-v0.1") |
|
model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Qwen2-72B-Instruct-v0.1") |
|
``` |
|
|