Qwen2-72B-Instruct-GGUF
Original Model
Run with LlamaEdge
LlamaEdge version: v0.11.2
Prompt template
Prompt type:
chatml
Prompt string
<|im_start|>system {system_message}<|im_end|> <|im_start|>user {prompt}<|im_end|> <|im_start|>assistant
Context size:
131072
Run as LlamaEdge service
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Qwen2-72B-Instruct-Q5_K_M.gguf \ llama-api-server.wasm \ --model-name Qwen2-72B-Instruct \ --prompt-template chatml \ --ctx-size 131072
Run as LlamaEdge command app
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Qwen2-72B-Instruct-Q5_K_M.gguf \ llama-chat.wasm \ --prompt-template chatml \ --ctx-size 131072
Quantized GGUF Models
Name | Quant method | Bits | Size | Use case |
---|---|---|---|---|
Qwen2-72B-Instruct-Q2_K.gguf | Q2_K | 2 | 29.8 GB | smallest, significant quality loss - not recommended for most purposes |
Qwen2-72B-Instruct-Q3_K_L.gguf | Q3_K_L | 3 | 39.5 GB | small, substantial quality loss |
Qwen2-72B-Instruct-Q3_K_M.gguf | Q3_K_M | 3 | 37.7 GB | very small, high quality loss |
Qwen2-72B-Instruct-Q3_K_S.gguf | Q3_K_S | 3 | 34.5 GB | very small, high quality loss |
Qwen2-72B-Instruct-Q4_0.gguf | Q4_0 | 4 | 41.2 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
Qwen2-72B-Instruct-Q4_K_M.gguf | Q4_K_M | 4 | 47.4 GB | medium, balanced quality - recommended |
Qwen2-72B-Instruct-Q4_K_S.gguf | Q4_K_S | 4 | 43.9 GB | small, greater quality loss |
Qwen2-72B-Instruct-Q5_0-00001-of-00002.gguf | Q5_0 | 5 | 32.2 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
Qwen2-72B-Instruct-Q5_0-00002-of-00002.gguf | Q5_0 | 5 | 18 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
Qwen2-72B-Instruct-Q5_K_M-00001-of-00002.gguf | Q5_K_M | 5 | 32.2 GB | large, very low quality loss - recommended |
Qwen2-72B-Instruct-Q5_K_M-00002-of-00002.gguf | Q5_K_M | 5 | 22.3 GB | large, very low quality loss - recommended |
Qwen2-72B-Instruct-Q5_K_S-00001-of-00002.gguf | Q5_K_S | 5 | 32.1 GB | large, low quality loss - recommended |
Qwen2-72B-Instruct-Q5_K_S-00002-of-00002.gguf | Q5_K_S | 5 | 32.1 GB | large, low quality loss - recommended |
Qwen2-72B-Instruct-Q6_K-00001-of-00002.gguf | Q6_K | 6 | 32.2 GB | very large, extremely low quality loss |
Qwen2-72B-Instruct-Q6_K-00002-of-00002.gguf | Q6_K | 6 | 32.2 GB | very large, extremely low quality loss |
Qwen2-72B-Instruct-Q8_0-00001-of-00003.gguf | Q8_0 | 8 | 32.1 GB | very large, extremely low quality loss - not recommended |
Qwen2-72B-Instruct-Q8_0-00002-of-00003.gguf | Q8_0 | 8 | 32.1 GB | very large, extremely low quality loss - not recommended |
Qwen2-72B-Instruct-Q8_0-00003-of-00003.gguf | Q8_0 | 8 | 32.1 GB | very large, extremely low quality loss - not recommended |
Qwen2-72B-Instruct-f16-00001-of-00005.gguf | f16 | 16 | 31.9 GB | |
Qwen2-72B-Instruct-f16-00002-of-00005.gguf | f16 | 16 | 32.1 GB | |
Qwen2-72B-Instruct-f16-00003-of-00005.gguf | f16 | 16 | 32.1 GB | |
Qwen2-72B-Instruct-f16-00004-of-00005.gguf | f16 | 16 | 32.1 GB | |
Qwen2-72B-Instruct-f16-00005-of-00005.gguf | f16 | 16 | 17.3 GB |
Quantized with llama.cpp b3705
- Downloads last month
- 250
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.