Instructions to use Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF", filename="ministral3_8b_pmra_knapsack_3p2.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF # Run inference directly in the terminal: llama-cli -hf Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF # Run inference directly in the terminal: llama-cli -hf Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF # Run inference directly in the terminal: ./llama-cli -hf Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF # Run inference directly in the terminal: ./build/bin/llama-cli -hf Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF
Use Docker
docker model run hf.co/Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF
- LM Studio
- Jan
- vLLM
How to use Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF
- Ollama
How to use Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF with Ollama:
ollama run hf.co/Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF
- Unsloth Studio
How to use Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF to start chatting
- Pi
How to use Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF
Run Hermes
hermes
- Docker Model Runner
How to use Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF with Docker Model Runner:
docker model run hf.co/Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF
- Lemonade
How to use Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF
Run and chat with the model
lemonade run user.Ministral-3-8B-Instruct-PMRA-GGUF-{{QUANT_TAG}}List all available models
lemonade list
Ministral 3 8B Instruct · PMRA mixed-precision GGUF
Two mixed-precision GGUFs of Mistral AI's Ministral 3 8B Instruct: a primary build at the IQ3_XS size budget and a leaner 3.2-bpw build for tight-RAM machines. Both beat the plain quant at their size on a held-out test split — the primary by ~0.18 NLL, the compact one by ~0.12 NLL while being ~311 MB smaller. Standard GGUFs for llama.cpp / Ollama, text generation.
The model
Ministral 3 8B Instruct is the instruction-tuned member of Mistral AI's Ministral 3 family — designed for edge and on-device deployment, fitting in 24 GB of VRAM at BF16 and under ~12 GB once quantized. It's natively multimodal (an 8.4B language model paired with a 0.4B vision encoder) and multilingual across dozens of languages (English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic, …), with strong instruction-following and system-prompt adherence.
Scope of this artifact: these GGUFs target the text stack for text generation in llama.cpp; image input is not exercised here. The build was calibrated and measured on English.
Why this build (PMRA)
A normal GGUF quant uses one format for nearly every tensor, paying the same bit-rate everywhere regardless of importance. Production Mixed-Rate Allocation (PMRA) measures each tensor's contribution to quality and spends bits where they help most: starting from a low-bit IQ2_M floor, it promotes the groups that matter to stronger formats under a fixed byte budget. The selection is frozen on calibration data, then re-scored on a held-out test split so the gain reflects generalization, not overfit.
Headline (held-out Wikitext-2 test, lower NLL is better):
| Build | NLL | size | vs IQ3_XS |
|---|---|---|---|
| PMRA primary (IQ3_XS budget) | 4.537 | 3.706 GB | −0.185 NLL, same size |
| PMRA compact (3.2 bpw) | 4.601 | 3.396 GB | −0.122 NLL, −311 MB |
plain IQ3_XS |
4.722 | 3.706 GB | — |
Both decisions: GO.
Which file?
ministral3_8b_pmra_knapsack_iq3xs_budget.gguf— primary quality build; pick this if you have the RAM.ministral3_8b_pmra_knapsack_3p2.gguf— the 3.2-bpw build; for ~8 GB machines, start here, close memory-heavy apps, and keep the context small.
Quick start
llama-cli -m ministral3_8b_pmra_knapsack_3p2.gguf \
-p "Write a short hello from PMRA." -n 80 --ctx-size 2048
Needs a recent llama.cpp build (or Ollama) with Ministral 3 support.
Footprint
| File | Selector | Size | Payload bpw | SHA-256 |
|---|---|---|---|---|
ministral3_8b_pmra_knapsack_iq3xs_budget.gguf |
c2_calib_knapsack_mixed |
3,713,801,312 |
3.492210 |
7f88294593cf419a5b39b4da2c7df356fee9528de947d6547b9d11d60a84ac5d |
ministral3_8b_pmra_knapsack_3p2.gguf |
c2_calib_knapsack_bpw_3p200_mixed |
3,403,422,816 |
3.199730 |
ff95384e68f211b238767e1783d20ce0b4a8be8a56ac8b906756c481831421a3 |
Both materialized and reloaded by the artifact builder with 0 tensor mismatches.
Benchmarks
Calibration: Wikitext-2-raw train (12 prompts). Selector eval: Wikitext-2-raw validation (128 prompts). Held-out eval: Wikitext-2-raw test (512 prompts); calibration/eval prompt overlap audited to 0. Lower NLL is better.
Held-out Wikitext-2 test:
| Variant | NLL | Payload bpw | Payload bytes |
|---|---|---|---|
| fp16 reference | 2.393904 |
16.000000 |
16,979,107,840 |
IQ2_M |
4.963936 |
2.920126 |
3,098,820,608 |
IQ3_XS (target / control) |
4.722369 |
3.492735 |
3,706,470,400 |
Q3_K_S |
4.757542 |
3.636073 |
3,858,579,456 |
| PMRA knapsack | 4.537475 |
3.492210 |
3,705,913,344 |
| PMRA knapsack 3.2 bpw | 4.600533 |
3.199730 |
3,395,534,848 |
| same-budget random | 4.912780 |
3.492210 |
3,705,913,344 |
Selector validation split (Wikitext-2 validation): PMRA knapsack 4.456880 vs IQ3_XS 4.649152 — consistent.
- primary vs
IQ3_XS: −0.184894 NLL, −557,056 bytes · vsQ3_K_S: −0.220067 NLL, −152,666,112 bytes · vs random: −0.375305 NLL · decision GO - compact vs
IQ3_XS: −0.121836 NLL, −310,935,552 bytes · vsQ3_K_S: −0.157010 NLL
How it was built
- base:
mistralai/Ministral-3-8B-Instruct-2512-BF16 - GGUF sources:
bartowski/mistralai_Ministral-3-8B-Instruct-2512-GGUF - tensor profile
mistral3· group modetensor· selectorc2_calib_knapsack_mixed - low source
IQ2_M→ target/controlIQ3_XS; promotion menuQ2_K,Q2_K_L,Q3_K_S,Q3_K_M,IQ4_XS
Files
ministral3_8b_pmra_knapsack_iq3xs_budget.gguf,ministral3_8b_pmra_knapsack_3p2.gguf— the modelsartifact_report*.json/.md,selector_result.json/.mdpublic_eval_wikitext_test_result.json/.md— the held-out evaluationMINISTRAL3_8B_INSTRUCT_PMRA.md— release card
Attribution & license
Derived from, with thanks to:
mistralai/Ministral-3-8B-Instruct-2512-BF16(Mistral AI)- GGUF quantizations from
bartowski/mistralai_Ministral-3-8B-Instruct-2512-GGUF - llama.cpp GGUF tooling
Released under apache-2.0. Preserve upstream model, license, and quantization attribution when redistributing derived artifacts.
Method + reproduction: https://github.com/asystemoffields/PMRA
- Downloads last month
- 1,119
We're not able to determine the quantization variants.
Model tree for Asystemoffields/Ministral-3-8B-Instruct-PMRA-GGUF
Base model
mistralai/Ministral-3-8B-Base-2512