NPC Agentic 7B โ€” GGUF

DOI

GGUF quants of ramankrishna10/npc-agentic-7b-v3 for llama.cpp / Ollama / LM Studio / local CPU+GPU inference.

See the FP16 reference card for the full training recipe, eval numbers, and known limitations (notably a GSM8K regression vs base โ€” use base Qwen2.5 or Qwen2.5-Math-7B for math-heavy workflows).

Files

File Quant Size Use case
npc-agentic-7b-Q4_K_M.gguf Q4_K_M ~4.4 GB default for Ollama / laptop CPU+GPU
npc-agentic-7b-Q5_K_M.gguf Q5_K_M ~5.1 GB higher-fidelity local inference
npc-agentic-7b-Q8_0.gguf Q8_0 ~7.7 GB near-fp16 quality, consumer-GPU friendly

Build by llama.cpp's convert_hf_to_gguf.py + llama-quantize.

Inference

llama.cpp

./llama-cli -m npc-agentic-7b-Q4_K_M.gguf \
    -p "Design an event-sourced microservice with exactly-once command handling." \
    -n 1024 --temp 0.7 --top-p 0.9

Ollama

# Pull the Q4_K_M quant into a local Ollama modelfile
echo "FROM ./npc-agentic-7b-Q4_K_M.gguf" > Modelfile
ollama create npc-agentic:7b -f Modelfile
ollama run npc-agentic:7b "Explain photosynthesis step by step."

LM Studio / Jan / Koboldcpp

Drop any of the .gguf files into the app's model directory. Use chat template: Qwen2 / ChatML (<|im_start|> / <|im_end|>).

See also


Built by Bottensor.

Citation

If you use NPC Agentic 7B in your work, please cite:

@misc{bachu2026npcagentic7b,
  title        = {NPC Agentic 7B: A Single-GPU QLoRA Recipe for a Laptop-Scale Conversational Model},
  author       = {Bachu, Rama Krishna},
  year         = {2026},
  month        = may,
  publisher    = {Zenodo},
  version      = {v1},
  doi          = {10.5281/zenodo.19954103},
  url          = {https://doi.org/10.5281/zenodo.19954103},
  note         = {Preprint}
}

Paper: https://doi.org/10.5281/zenodo.19954103

Downloads last month
241
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ramankrishna10/npc-agentic-7b-v3-gguf

Base model

Qwen/Qwen2.5-7B
Quantized
(2)
this model