Ministral 3 8B Instruct · PMRA mixed-precision GGUF

Two mixed-precision GGUFs of Mistral AI's Ministral 3 8B Instruct: a primary build at the IQ3_XS size budget and a leaner 3.2-bpw build for tight-RAM machines. Both beat the plain quant at their size on a held-out test split — the primary by ~0.18 NLL, the compact one by ~0.12 NLL while being ~311 MB smaller. Standard GGUFs for llama.cpp / Ollama, text generation.

The model

Ministral 3 8B Instruct is the instruction-tuned member of Mistral AI's Ministral 3 family — designed for edge and on-device deployment, fitting in 24 GB of VRAM at BF16 and under ~12 GB once quantized. It's natively multimodal (an 8.4B language model paired with a 0.4B vision encoder) and multilingual across dozens of languages (English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic, …), with strong instruction-following and system-prompt adherence.

Scope of this artifact: these GGUFs target the text stack for text generation in llama.cpp; image input is not exercised here. The build was calibrated and measured on English.

Why this build (PMRA)

A normal GGUF quant uses one format for nearly every tensor, paying the same bit-rate everywhere regardless of importance. Production Mixed-Rate Allocation (PMRA) measures each tensor's contribution to quality and spends bits where they help most: starting from a low-bit IQ2_M floor, it promotes the groups that matter to stronger formats under a fixed byte budget. The selection is frozen on calibration data, then re-scored on a held-out test split so the gain reflects generalization, not overfit.

Headline (held-out Wikitext-2 test, lower NLL is better):

Build NLL size vs IQ3_XS
PMRA primary (IQ3_XS budget) 4.537 3.706 GB −0.185 NLL, same size
PMRA compact (3.2 bpw) 4.601 3.396 GB −0.122 NLL, −311 MB
plain IQ3_XS 4.722 3.706 GB

Both decisions: GO.

Which file?

  • ministral3_8b_pmra_knapsack_iq3xs_budget.gguf — primary quality build; pick this if you have the RAM.
  • ministral3_8b_pmra_knapsack_3p2.gguf — the 3.2-bpw build; for ~8 GB machines, start here, close memory-heavy apps, and keep the context small.

Quick start

llama-cli -m ministral3_8b_pmra_knapsack_3p2.gguf \
  -p "Write a short hello from PMRA." -n 80 --ctx-size 2048

Needs a recent llama.cpp build (or Ollama) with Ministral 3 support.

Footprint

File Selector Size Payload bpw SHA-256
ministral3_8b_pmra_knapsack_iq3xs_budget.gguf c2_calib_knapsack_mixed 3,713,801,312 3.492210 7f88294593cf419a5b39b4da2c7df356fee9528de947d6547b9d11d60a84ac5d
ministral3_8b_pmra_knapsack_3p2.gguf c2_calib_knapsack_bpw_3p200_mixed 3,403,422,816 3.199730 ff95384e68f211b238767e1783d20ce0b4a8be8a56ac8b906756c481831421a3

Both materialized and reloaded by the artifact builder with 0 tensor mismatches.

Benchmarks

Calibration: Wikitext-2-raw train (12 prompts). Selector eval: Wikitext-2-raw validation (128 prompts). Held-out eval: Wikitext-2-raw test (512 prompts); calibration/eval prompt overlap audited to 0. Lower NLL is better.

Held-out Wikitext-2 test:

Variant NLL Payload bpw Payload bytes
fp16 reference 2.393904 16.000000 16,979,107,840
IQ2_M 4.963936 2.920126 3,098,820,608
IQ3_XS (target / control) 4.722369 3.492735 3,706,470,400
Q3_K_S 4.757542 3.636073 3,858,579,456
PMRA knapsack 4.537475 3.492210 3,705,913,344
PMRA knapsack 3.2 bpw 4.600533 3.199730 3,395,534,848
same-budget random 4.912780 3.492210 3,705,913,344

Selector validation split (Wikitext-2 validation): PMRA knapsack 4.456880 vs IQ3_XS 4.649152 — consistent.

  • primary vs IQ3_XS: −0.184894 NLL, −557,056 bytes · vs Q3_K_S: −0.220067 NLL, −152,666,112 bytes · vs random: −0.375305 NLL · decision GO
  • compact vs IQ3_XS: −0.121836 NLL, −310,935,552 bytes · vs Q3_K_S: −0.157010 NLL

How it was built

  • base: mistralai/Ministral-3-8B-Instruct-2512-BF16
  • GGUF sources: bartowski/mistralai_Ministral-3-8B-Instruct-2512-GGUF
  • tensor profile mistral3 · group mode tensor · selector c2_calib_knapsack_mixed
  • low source IQ2_M → target/control IQ3_XS; promotion menu Q2_K, Q2_K_L, Q3_K_S, Q3_K_M, IQ4_XS

Files

  • ministral3_8b_pmra_knapsack_iq3xs_budget.gguf, ministral3_8b_pmra_knapsack_3p2.gguf — the models
  • artifact_report*.json / .md, selector_result.json / .md
  • public_eval_wikitext_test_result.json / .md — the held-out evaluation
  • MINISTRAL3_8B_INSTRUCT_PMRA.md — release card

Attribution & license

Derived from, with thanks to:

  • mistralai/Ministral-3-8B-Instruct-2512-BF16 (Mistral AI)
  • GGUF quantizations from bartowski/mistralai_Ministral-3-8B-Instruct-2512-GGUF
  • llama.cpp GGUF tooling

Released under apache-2.0. Preserve upstream model, license, and quantization attribution when redistributing derived artifacts.

Method + reproduction: https://github.com/asystemoffields/PMRA

Downloads last month
1,119
GGUF
Model size
8B params
Architecture
mistral3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF

Collection including Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF