Instructions to use build-small-hackathon/mafia-gemma-4-12B-it-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use build-small-hackathon/mafia-gemma-4-12B-it-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="build-small-hackathon/mafia-gemma-4-12B-it-gguf", filename="gemma-4-12b-it.BF16-mmproj.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use build-small-hackathon/mafia-gemma-4-12B-it-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf build-small-hackathon/mafia-gemma-4-12B-it-gguf:BF16 # Run inference directly in the terminal: llama-cli -hf build-small-hackathon/mafia-gemma-4-12B-it-gguf:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf build-small-hackathon/mafia-gemma-4-12B-it-gguf:BF16 # Run inference directly in the terminal: llama-cli -hf build-small-hackathon/mafia-gemma-4-12B-it-gguf:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf build-small-hackathon/mafia-gemma-4-12B-it-gguf:BF16 # Run inference directly in the terminal: ./llama-cli -hf build-small-hackathon/mafia-gemma-4-12B-it-gguf:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf build-small-hackathon/mafia-gemma-4-12B-it-gguf:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf build-small-hackathon/mafia-gemma-4-12B-it-gguf:BF16
Use Docker
docker model run hf.co/build-small-hackathon/mafia-gemma-4-12B-it-gguf:BF16
- LM Studio
- Jan
- vLLM
How to use build-small-hackathon/mafia-gemma-4-12B-it-gguf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "build-small-hackathon/mafia-gemma-4-12B-it-gguf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "build-small-hackathon/mafia-gemma-4-12B-it-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/build-small-hackathon/mafia-gemma-4-12B-it-gguf:BF16
- Ollama
How to use build-small-hackathon/mafia-gemma-4-12B-it-gguf with Ollama:
ollama run hf.co/build-small-hackathon/mafia-gemma-4-12B-it-gguf:BF16
- Unsloth Studio
How to use build-small-hackathon/mafia-gemma-4-12B-it-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for build-small-hackathon/mafia-gemma-4-12B-it-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for build-small-hackathon/mafia-gemma-4-12B-it-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for build-small-hackathon/mafia-gemma-4-12B-it-gguf to start chatting
- Atomic Chat new
- Docker Model Runner
How to use build-small-hackathon/mafia-gemma-4-12B-it-gguf with Docker Model Runner:
docker model run hf.co/build-small-hackathon/mafia-gemma-4-12B-it-gguf:BF16
- Lemonade
How to use build-small-hackathon/mafia-gemma-4-12B-it-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull build-small-hackathon/mafia-gemma-4-12B-it-gguf:BF16
Run and chat with the model
lemonade run user.mafia-gemma-4-12B-it-gguf-BF16
List all available models
lemonade list
Mafia Gemma 4 12B IT GGUF
Playable Mafia Space | Training Dataset | Merged Transformers Model | Base Model
This repository contains the GGUF Q8_0 runtime build for
mafia-gemma-4-12B-it, a role-conditioned social-deduction fine-tune of Gemma 4 12B IT. The benchmark tables below report BF16 and Q8_0 together as one model family,mafia-gemma-4-12B-it, because the two repositories are deployment formats of the same trained agent policy.
Model Overview
Mafia Gemma 4 12B IT is designed for seven-player Mafia / Werewolf style games where an agent may be assigned Mafia, Detective, Doctor, or Villager at random. This GGUF repository is intended for llama.cpp-based local or server inference.
The model was trained to improve:
- legal JSON/action formatting;
- role-conditioned choices for Mafia, Detective, Doctor, and Villager;
- public discussion, accusation, defense, claim, and vote behavior;
- belief, claim, suspicion, and deception tracking;
- night actions such as Mafia kills, Doctor protects, and Detective checks;
- compatibility with moderated multi-day Mafia games using classic win conditions.
Available Files
| File | Purpose |
|---|---|
gemma-4-12b-it.Q8_0.gguf |
Q8_0 quantized language model |
gemma-4-12b-it.BF16-mmproj.gguf |
Multimodal projector file from the conversion pipeline |
For text-only Mafia play, use gemma-4-12b-it.Q8_0.gguf.
Training Data
Fine-tuning used the unified
Alfaxad/mafia-dataset,
a canonical event-log corpus built for social-deduction agents. The dataset
combines converted and normalized examples from:
- Mini-Mafia style action primitives and role-conditioned decisions;
- LLMafia / Time-to-Talk style communication and timing data;
- Bayesian Social Deduction / GRAIL belief and role-count examples;
- Werewolf / Wolf-Enhance style debate traces after conversion to the Mafia schema;
- WOLF-inspired deception, suspicion, claim, and voting labels where available;
- our seven-player harness logs with 2 Mafia, 1 Detective, 1 Doctor, and 3 Villagers.
The schema separates public transcript, private role information, legal action sets, hidden state, votes, night actions, claims, and outcome labels. Hidden information is intentionally represented in the training rows only where the acting role is allowed to see it.
Fine-Tuning And Conversion
- Base model:
unsloth/gemma-4-12b-it - Method: LoRA SFT with Unsloth and TRL
- LoRA: rank 32, alpha 64
- Context length: 4096 tokens
- Training sample: 60k train rows
- Validation/test sample: 2k validation rows, 2k test rows
- Optimizer steps: 1000
- Final eval loss: 0.57703
- Final train loss: 0.04144
- GGUF quantization: Q8_0
- GGUF SHA-256:
e1c320c43638bb0fde6986f669eada9850ac89d02acf7ba627efb87ea69e0572
How to Run
Use a Gemma 4 capable llama.cpp build. Older llama.cpp builds may fail with an
unknown architecture error for gemma4.
Download the files:
hf download Alfaxad/mafia-gemma-4-12B-it-gguf \
gemma-4-12b-it.Q8_0.gguf \
gemma-4-12b-it.BF16-mmproj.gguf \
--local-dir ./mafia-gemma-4-12b-it-gguf
Run text-only inference:
llama-cli \
-m ./mafia-gemma-4-12b-it-gguf/gemma-4-12b-it.Q8_0.gguf \
--jinja \
--temp 1.0 \
--top-p 0.95 \
--top-k 64 \
-p "Role: Doctor. Alive players: Ada, Blake, Casey, Devon. Night 2 action: choose one player to protect. Return compact JSON."
Run a local server:
llama-server \
-m ./mafia-gemma-4-12b-it-gguf/gemma-4-12b-it.Q8_0.gguf \
--jinja \
-c 4096 \
--temp 1.0 \
--top-p 0.95 \
--top-k 64 \
--host 0.0.0.0 \
--port 8080
Full-Game Evaluation
The table below combines the results of merged BF16 and GGUF Q8_0 deployments as one
model family, mafia-gemma-4-12B-it. Results come from 24 full seven-player games, with every player
using HOLY GRAIL v4 and the non-player moderator fixed to base Gemma 4 12B
BF16 using a Time-to-Talk scheduler plus generator. The benchmark included
20 pairwise local-vs-frontier games and 4 mixed all-star games. Total
API/player/moderator errors: 0.
Overall Slot Scoreboard
| Model | Player slots | Team win rate | Alive final | Avg messages | Avg votes cast | Avg votes received | False claim rate |
|---|---|---|---|---|---|---|---|
| mafia-gemma-4-12B-it | 78 | 0.615 | 0.538 | 1.333 | 1.936 | 1.756 | 0.013 |
| GPT-5 medium | 18 | 0.611 | 0.722 | 1.167 | 1.778 | 1.222 | 0.056 |
| GPT-5-mini | 18 | 0.444 | 0.389 | 1.444 | 1.944 | 2.222 | 0.000 |
| Claude Opus 4.8 | 18 | 0.611 | 0.444 | 1.500 | 2.167 | 2.556 | 0.000 |
| Claude Sonnet 4.6 | 18 | 0.111 | 0.333 | 1.278 | 1.667 | 2.667 | 0.000 |
| Gemini 2.5 Pro OSV | 18 | 0.889 | 0.556 | 1.500 | 2.222 | 1.889 | 0.000 |
Pairwise Local-vs-Frontier Results
mafia-gemma-4-12B-it combines BF16 and Q8_0 rows. Each opponent has four
local-side trials: two with Mafia Gemma controlling Mafia and two with Mafia
Gemma controlling Good.
| Opponent | Local wins overall | Local as Mafia | Local as Good |
|---|---|---|---|
| GPT-5 medium | 1/4 | 1/2 | 0/2 |
| GPT-5-mini | 3/4 | 1/2 | 2/2 |
| Claude Opus 4.8 | 2/4 | 0/2 | 2/2 |
| Claude Sonnet 4.6 | 4/4 | 2/2 | 2/2 |
| Gemini 2.5 Pro OSV | 1/4 | 0/2 | 1/2 |
Role Diagnostics
| Role | Slots | Team win rate | Alive final | Vote accuracy | Avg messages | Avg claims | False claims |
|---|---|---|---|---|---|---|---|
| Mafia | 21 | 0.381 | 0.381 | 1.000 | 1.476 | 0.048 | 0.048 |
| Detective | 10 | 0.700 | 0.500 | 0.588 | 1.100 | 0.900 | 0.000 |
| Doctor | 14 | 0.714 | 0.643 | 0.700 | 1.357 | 0.857 | 0.000 |
| Villager | 33 | 0.697 | 0.606 | 0.661 | 1.303 | 0.000 | 0.000 |
Runtime Call Metrics
Latency is runtime and provider dependent, so use this as an operational reference rather than a pure capability ranking.
| Model | Calls | Avg latency sec | Max latency sec | Avg output chars | Failed calls |
|---|---|---|---|---|---|
| mafia-gemma-4-12B-it | 104 | 5.926 | 103.935 | 218.2 | 0 |
| GPT-5 medium | 21 | 31.484 | 102.330 | 229.8 | 0 |
| GPT-5-mini | 26 | 5.888 | 20.425 | 225.3 | 0 |
| Claude Opus 4.8 | 27 | 3.228 | 6.578 | 249.7 | 0 |
| Claude Sonnet 4.6 | 23 | 2.835 | 3.746 | 240.6 | 0 |
| Gemini 2.5 Pro OSV | 27 | 19.449 | 138.644 | 143.7 | 0 |
Intended Use
This model is intended for game agents, research harnesses, and AI-native social-deduction experiences. It works best when paired with:
- an authoritative game engine that enforces legal actions;
- strict public/private view isolation;
- role-specific prompts;
- an external memory/ledger layer such as HOLY GRAIL;
- a moderator that controls turn timing and table flow.
Limitations
- The full-game benchmark is small and game-specific. It should not be treated as a broad general reasoning benchmark.
- The model is not a security boundary. Hidden-role secrecy must be enforced by the engine and prompt construction.
- The model is tuned for Mafia-style game behavior, including in-game deception. Do not use it for real-world deception, impersonation, or misinformation.
- Best behavior depends on structured prompts and legal-action validation.
License
This model is distributed under the Apache 2.0 license, following the base Gemma 4 license information provided by the upstream model card.
- Downloads last month
- 293
8-bit