---
pipeline_tag: text-generation
tags:
- qwen
- qwen-2
- quantized
- 2-bit
- 3-bit
- 4-bit
- 5-bit
- 6-bit
- 8-bit
- 16-bit
- GGUF
inference: false
model_creator: MaziyarPanahi
model_name: Qwen2-72B-Instruct-v0.1-GGUF
quantized_by: MaziyarPanahi
license: other
license_name: tongyi-qianwen
license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
---


# MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF

The GGUF and quantized models here are based on [MaziyarPanahi/Qwen2-72B-Instruct-v0.1](https://huggingface.co/MaziyarPanahi/Qwen2-72B-Instruct-v0.1) model

## How to download
You can download only the quants you need instead of cloning the entire repository as follows:

```
huggingface-cli download MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF --local-dir . --include '*Q2_K*gguf'
```

## Load GGUF models

You `MUST` follow the prompt template provided by Llama-3:


```sh
./llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q2_K.gguf -p "<|im_start|>user\nJust say 1, 2, 3 hi and NOTHING else\n<|im_end|>\n<|im_start|>assistant\n" -n 1024
```


## Original README

---

# MaziyarPanahi/Qwen2-72B-Instruct-v0.1

This is a fine-tuned version of the `Qwen/Qwen2-72B-Instruct` model. It aims to improve the base model across all benchmarks.

# ⚡ Quantized GGUF

All GGUF models are available here: [MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF](https://huggingface.co/MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF)

# 🏆 [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)


|    Tasks     |Version|Filter|n-shot|Metric|Value |   |Stderr|
|--------------|------:|------|-----:|------|-----:|---|-----:|
|truthfulqa_mc2|      2|none  |     0|acc   |0.6761|±  |0.0148|

|  Tasks   |Version|Filter|n-shot|Metric|Value |   |Stderr|
|----------|------:|------|-----:|------|-----:|---|-----:|
|winogrande|      1|none  |     5|acc   |0.8248|±  |0.0107|

|    Tasks    |Version|Filter|n-shot| Metric |Value |   |Stderr|
|-------------|------:|------|-----:|--------|-----:|---|-----:|
|arc_challenge|      1|none  |    25|acc     |0.6852|±  |0.0136|
|             |       |none  |    25|acc_norm|0.7184|±  |0.0131|

|Tasks|Version|     Filter     |n-shot|  Metric   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|-----:|---|-----:|
|gsm8k|      3|strict-match    |     5|exact_match|0.8582|±  |0.0096|
|     |       |flexible-extract|     5|exact_match|0.8893|±  |0.0086|

# Prompt Template

This model uses `ChatML` prompt template:

```
<|im_start|>system
{System}
<|im_end|>
<|im_start|>user
{User}
<|im_end|>
<|im_start|>assistant
{Assistant}
````

# How to use


```python

# Use a pipeline as a high-level helper

from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="MaziyarPanahi/Qwen2-72B-Instruct-v0.1")
pipe(messages)


# Load model directly

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Qwen2-72B-Instruct-v0.1")
model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Qwen2-72B-Instruct-v0.1")
```