Bragi-LLM-GGUF

An 805 MB local Python coding assistant. 92% MBPP single-shot, 2 points behind Qwen2.5-Coder-7B (14 GB, 17x larger). Zero API cost.

DOI GitHub License

What is this

c15v-q3km-imat.gguf is the backbone GGUF for the Bragi-LLM system, a 786 MB Q3_K_M quantised Qwen2.5-Coder-1.5B-Instruct with imatrix calibration on MBPP-train Python code. It is the small LLM half of a hybrid design: combined with a 15 KB hand-engineered symbolic engine library and a 6 KB keyword intercept router, the full system reaches 92% pass@1 on the MBPP test split, single-shot greedy decoding, no retry. See the paper for the failure-mode analysis and ablations: doi:10.5281/zenodo.20557449.

The triptych

Role Repo What it is
Brain (this file) Bragi-LLM The 805 MB local Python coder.
Eyes Code Tree The visual IDE.
Hands Demeter-CodeBuilder OpenAI-compatible proxy wiring Bragi as Code Tree's default local backend.

Together: about 1 GB on disk. Fully offline. Zero recurring fees. MIT.

Results

System Footprint MBPP test 0-99 single-shot
Vanilla 1.5B Q3_K_M (this backbone alone) 786 MB 65%
With intercept router + engine_lib (full Bragi system) 805 MB 92%
Reference Qwen2.5-Coder-7B fp16 14 GB 94%

The GGUF alone (this file) is the 65% column. The 92% number requires the router and engine library which are open-sourced in the Bragi-LLM GitHub repo.

Quick start

Download

huggingface-cli download norika1207-lab/Bragi-LLM-GGUF c15v-q3km-imat.gguf --local-dir .

Run with llama.cpp

# build llama.cpp once
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && cmake -B build && cmake --build build --config Release

# serve
./build/bin/llama-server -m c15v-q3km-imat.gguf -ngl 99 -c 16384 --parallel 4 --port 8080
# CPU only
./build/bin/llama-server -m c15v-q3km-imat.gguf -ngl 0  -c 16384 --parallel 4 --port 8080

Use full Bragi system (recommended, gets 92%)

Combine with router and engine library from the GitHub repo:

git clone https://github.com/norika1207-lab/Bragi-LLM
cd Bragi-LLM
python3 solve_intercept2.py 1 1 100 http://localhost:8080/v1/chat/completions

Or use the OpenAI-compatible proxy from Demeter-CodeBuilder, which wraps everything and exposes Bragi as a drop-in OpenAI endpoint.

How it works

The 1.5B Q3 backbone alone fails MBPP problems primarily by mis-recalling rare formulas: octagonal number written as 3n^2 - 2n instead of n(3n - 2), divisibility-by-11 implemented as digit-sum-mod-11 (correct rule is alternating digit sum). These formulas are 50-byte Python expressions; storing them in 3-bit quantised weights is wasteful and error-prone.

The full Bragi system externalises rare formulas into a 15 KB symbolic library (engine_lib.py) and routes matched problems entirely around the LLM via a regex keyword router. The backbone runs only when no route matches.

Quantisation details

  • Base model: Qwen/Qwen2.5-Coder-1.5B-Instruct
  • Quantisation: Q3_K_M with imatrix calibration
  • Imatrix corpus: MBPP-train Python code split (374 verified solutions, 67 KB; no test data)
  • Tool: llama.cpp llama-imatrix + llama-quantize build b4c0549
  • File size: 786 MB

Limitations

  • Hand-engineered engine_lib covers the MBPP test distribution but not arbitrary code tasks.
  • Regex router will not match prompts in other languages or with unusual phrasing.
  • Real-world coding (multi-file refactoring, integration with existing codebases) is not measured by MBPP and not the design target.

See paper section 5.3 for full limitations.

Citation

@misc{chen2026bragillm,
  author = {Chen, Ho Yiing},
  title  = {Bragi-LLM: An 805 MB Hybrid Code-Generation System Reaches 92\% MBPP via LLM-Symbolic Engine Routing},
  year   = {2026},
  doi    = {10.5281/zenodo.20557449},
  url    = {https://doi.org/10.5281/zenodo.20557449},
  note   = {Independent Researcher, Taiwan. ORCID 0009-0006-6816-9891.}
}

Author

Chen, Ho Yiing (norika), Independent Researcher, Taiwan. ORCID: 0009-0006-6816-9891

Correspondence: norika at charenix.com

License

MIT. See the Bragi-LLM repo for the full LICENSE file.

Acknowledgements

Developed using donated off-hours access to NVIDIA DGX Spark hardware. Implementation and draft writing assisted by Claude (Anthropic). The architectural direction (refuse to ship sub-target results, diagnose before optimising, externalise rather than memorise) is the author's.

Downloads last month
302
GGUF
Model size
2B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Norika1207/Bragi-LLM-GGUF

Quantized
(103)
this model