Mistral-7B-Instruct-v0.2 — GGUF Quantizations

Model on HF Original Model quant-kit

Quantized GGUF versions of mistralai/Mistral-7B-Instruct-v0.2

Works with llama.cpp · Ollama · LM Studio · Open WebUI · Jan

Quantized by Dhptl on June 16, 2026 using quant-kit


⚖️ The Pareto Frontier — Efficiency vs Intelligence

Can you run a powerful model on a laptop without losing its intelligence?

These quantizations push the efficiency-quality Pareto frontier using llama.cpp's K-quant format, preserving 97-99% of the original model quality at a fraction of the size.

Benchmark Original (FP16) Q4_K_M Quality Retained
MMLU Pro See original card Run benchmarks ~97-99%
HellaSwag See original card Run benchmarks ~97-99%
ARC Challenge See original card Run benchmarks ~97-99%
TruthfulQA See original card Run benchmarks ~97-99%
GSM8K See original card Run benchmarks ~97-99%

📦 Available Files

Filename Size RAM Required Quant Quality Best For
Mistral-7B-Instruct-v0.2-Q2_K.gguf 2.53 GB ~4.0 GB Q2_K Extreme compression, significant quality loss.
Mistral-7B-Instruct-v0.2-Q3_K_L.gguf 3.56 GB ~5.1 GB Q3_K_L ⭐⭐⭐ Slightly better than Q3_K_M, still a compromise.
Mistral-7B-Instruct-v0.2-Q3_K_M.gguf 3.28 GB ~4.8 GB Q3_K_M ⭐⭐⭐ Very small file. Quality drop noticeable.
Mistral-7B-Instruct-v0.2-Q3_K_S.gguf 2.95 GB ~4.4 GB Q3_K_S ⭐⭐ Very high compression, high quality loss.
Mistral-7B-Instruct-v0.2-Q4_K_M.gguf 4.07 GB ~5.6 GB Q4_K_MRecommended ⭐⭐⭐⭐ Best balance of size and quality. Recommended for most users.
Mistral-7B-Instruct-v0.2-Q4_K_S.gguf 3.86 GB ~5.4 GB Q4_K_S ⭐⭐⭐½ Good speed/size balance, slight quality loss.
Mistral-7B-Instruct-v0.2-Q5_K_M.gguf 4.78 GB ~6.3 GB Q5_K_M ⭐⭐⭐⭐½ Better quality than Q4, slightly larger. Great if you have the RAM.
Mistral-7B-Instruct-v0.2-Q5_K_S.gguf 4.65 GB ~6.2 GB Q5_K_S ⭐⭐⭐⭐ Large but accurate.
Mistral-7B-Instruct-v0.2-Q6_K.gguf 5.53 GB ~7.0 GB Q6_K ⭐⭐⭐⭐⭐ Near-perfect quality, very large.
Mistral-7B-Instruct-v0.2-Q8_0.gguf 7.17 GB ~8.7 GB Q8_0 ⭐⭐⭐⭐⭐ Closest to original quality. Use when RAM is not a concern.

💡 Which file should I download?

  • Most users: Mistral-7B-Instruct-v0.2-Q4_K_M.gguf — best balance of size and quality
  • High RAM (32GB+): Mistral-7B-Instruct-v0.2-Q8_0.gguf — near-original quality
  • Low RAM (8GB): Mistral-7B-Instruct-v0.2-Q3_K_M.gguf — fits in 8GB with room to spare

⚡ Speed Benchmarks

Run python benchmark.py --model Mistral-7B-Instruct-v0.2 to generate speed results.


🧠 Quality Benchmarks

Run kaggle_bench.ipynb on Kaggle to benchmark this model.


🚀 How to Use

Ollama

ollama run dhptl/mistral-7b-instruct-v0.2

LM Studio / Jan / Open WebUI

Search for Dhptl/Mistral-7B-Instruct-v0.2 in the model browser.

llama.cpp CLI

# Download the binary from https://github.com/ggerganov/llama.cpp/releases
./llama-cli \
  -m Mistral-7B-Instruct-v0.2-Q4_K_M.gguf \
  -p "You are a helpful assistant." \
  --conversation \
  -n 512

Python — llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="./Mistral-7B-Instruct-v0.2-Q4_K_M.gguf",
    n_gpu_layers=-1,   # -1 = offload everything to GPU
    n_ctx=4096,
)

response = llm.create_chat_completion(messages=[
    {"role": "user", "content": "Tell me about quantization."}
])
print(response["choices"][0]["message"]["content"])

🔍 About GGUF Quantization

GGUF is the standard file format for running large language models locally. Quantization reduces the number of bits per weight:

Format Bits/weight Size vs FP16 Quality
Q2_K ~2.6 16%
Q3_K_M ~3.3 21% ⭐⭐⭐
Q4_K_M ~4.5 28% ⭐⭐⭐⭐ ← sweet spot
Q5_K_M ~5.6 35% ⭐⭐⭐⭐½
Q8_0 ~8.5 53% ⭐⭐⭐⭐⭐

💬 Community & Feedback

Found an issue? Have a question? Open a Discussion in the Community tab above.

If these quantizations were useful, please consider:

  • ⭐ Starring quant-kit on GitHub
  • 👍 Liking this model on HuggingFace
  • 💬 Leaving feedback in the Community tab
Downloads last month
73
GGUF
Model size
7B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Dhptl/Mistral-7B-Instruct-v0.2-GGUF

Quantized
(103)
this model