🩸 Sangue e Grafi — Gemma 4 E2B GGUF (Q4_K_M)

Ready-to-run quantized model — Gemma 4B with SFT + GRPO fully merged and converted to GGUF for local inference.

Sangue e Grafi banner

Model Description

This is the fully merged and quantized version of the Sangue e Grafi Gemma pipeline:

google/gemma-4-E2B-it
  + SFT adapter (merged)
  + GRPO adapter (merged)
  → GGUF Q4_K_M quantization
  → ~3.3 GB single file

All training stages (SFT on 500 adversarial scenarios + GRPO reinforcement learning) are baked into a single GGUF file, ready for local inference with llama.cpp, llama-cpp-python, or ollama.

File Details

Property Value
Format GGUF (Q4_K_M quantization)
Size ~3.3 GB
Base model google/gemma-4-E2B-it (4B params)
Training SFT + GRPO, fully merged before quantization
Compatible with llama.cpp, llama-cpp-python, ollama, LM Studio

Benchmark Results 📊

Benchmark KG Agent (this model) Gemini 2.5 Flash (no KG)
Easy (10 seeds) 10/10 (100%) 3/10 (30%)
Hard dev-set (10 seeds) 5/10 (50%)

Usage

With llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="cyberandy/sangue-e-grafi-gemma4-e2b-gguf",
    filename="*.gguf",
    n_ctx=4096,
)

output = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Your kinship question here..."}]
)

With ollama

# Download and run
ollama run hf.co/cyberandy/sangue-e-grafi-gemma4-e2b-gguf

With llama.cpp CLI

# Download the GGUF file, then:
./llama-cli -m sangue-e-grafi-gemma4-e2b.gguf -p "Your prompt here" -n 512

Intended Uses & Limitations

Intended uses:

  • Local/edge deployment of the KG-grounded agent
  • Quick experimentation without GPU or adapter merging
  • Integration with llama.cpp-based toolchains

Limitations:

  • Q4_K_M quantization may slightly reduce accuracy vs full-precision
  • Still requires the KG agent framework for full pipeline performance
  • Domain-specific to Italian kinship / inheritance law

Source Adapters

This GGUF was built from:

  1. SFT adapter: sangue-e-grafi-gemma4-e2b-sft-adversarial-v7
  2. GRPO adapter: sangue-e-grafi-gemma4-e2b-grpo-run-f-v7

Project Links

Resource Link
🚀 Live Demo HF Space
📦 GitHub cyberandy/sangue-e-grafi
📄 Paper RLM-on-KG (arXiv:2604.17056)
📊 Agent Traces Dataset sangue-e-grafi-agent-traces

Citation

@misc{sangue-e-grafi-2026,
  title   = {Sangue e Grafi: Small Models Beat Frontier LLMs on Adversarial Kinship Reasoning with Knowledge Graph Agents},
  author  = {Andrea Volpini},
  year    = {2026},
  url     = {https://github.com/cyberandy/sangue-e-grafi},
  note    = {Hugging Face Build Small Hackathon 2026}
}
Downloads last month
131
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cyberandy/sangue-e-grafi-gemma4-e2b-gguf

Quantized
(230)
this model

Paper for cyberandy/sangue-e-grafi-gemma4-e2b-gguf

Evaluation results