--- pipeline_tag: text-generation tags: - qwen - qwen-2 - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - 16-bit - GGUF inference: false model_creator: MaziyarPanahi model_name: Qwen2-72B-Instruct-v0.1-GGUF quantized_by: MaziyarPanahi license: other license_name: tongyi-qianwen license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE --- # MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF The GGUF and quantized models here are based on [MaziyarPanahi/Qwen2-72B-Instruct-v0.1](https://huggingface.co/MaziyarPanahi/Qwen2-72B-Instruct-v0.1) model ## How to download You can download only the quants you need instead of cloning the entire repository as follows: ``` huggingface-cli download MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF --local-dir . --include '*Q2_K*gguf' ``` ## Load GGUF models You `MUST` follow the prompt template provided by Llama-3: ```sh ./llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q2_K.gguf -p "<|im_start|>user\nJust say 1, 2, 3 hi and NOTHING else\n<|im_end|>\n<|im_start|>assistant\n" -n 1024 ``` ## Original README --- # MaziyarPanahi/Qwen2-72B-Instruct-v0.1 This is a fine-tuned version of the `Qwen/Qwen2-72B-Instruct` model. It aims to improve the base model across all benchmarks. # ⚡ Quantized GGUF All GGUF models are available here: [MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF](https://huggingface.co/MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF) # 🏆 [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) | Tasks |Version|Filter|n-shot|Metric|Value | |Stderr| |--------------|------:|------|-----:|------|-----:|---|-----:| |truthfulqa_mc2| 2|none | 0|acc |0.6761|± |0.0148| | Tasks |Version|Filter|n-shot|Metric|Value | |Stderr| |----------|------:|------|-----:|------|-----:|---|-----:| |winogrande| 1|none | 5|acc |0.8248|± |0.0107| | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |-------------|------:|------|-----:|--------|-----:|---|-----:| |arc_challenge| 1|none | 25|acc |0.6852|± |0.0136| | | |none | 25|acc_norm|0.7184|± |0.0131| |Tasks|Version| Filter |n-shot| Metric |Value | |Stderr| |-----|------:|----------------|-----:|-----------|-----:|---|-----:| |gsm8k| 3|strict-match | 5|exact_match|0.8582|± |0.0096| | | |flexible-extract| 5|exact_match|0.8893|± |0.0086| # Prompt Template This model uses `ChatML` prompt template: ``` <|im_start|>system {System} <|im_end|> <|im_start|>user {User} <|im_end|> <|im_start|>assistant {Assistant} ```` # How to use ```python # Use a pipeline as a high-level helper from transformers import pipeline messages = [ {"role": "user", "content": "Who are you?"}, ] pipe = pipeline("text-generation", model="MaziyarPanahi/Qwen2-72B-Instruct-v0.1") pipe(messages) # Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Qwen2-72B-Instruct-v0.1") model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Qwen2-72B-Instruct-v0.1") ```