Instructions to use build-small-hackathon/mafia-gemma-4-12B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use build-small-hackathon/mafia-gemma-4-12B-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="build-small-hackathon/mafia-gemma-4-12B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("build-small-hackathon/mafia-gemma-4-12B-it")
model = AutoModelForMultimodalLM.from_pretrained("build-small-hackathon/mafia-gemma-4-12B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use build-small-hackathon/mafia-gemma-4-12B-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "build-small-hackathon/mafia-gemma-4-12B-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "build-small-hackathon/mafia-gemma-4-12B-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/build-small-hackathon/mafia-gemma-4-12B-it

SGLang

How to use build-small-hackathon/mafia-gemma-4-12B-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "build-small-hackathon/mafia-gemma-4-12B-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "build-small-hackathon/mafia-gemma-4-12B-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "build-small-hackathon/mafia-gemma-4-12B-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "build-small-hackathon/mafia-gemma-4-12B-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use build-small-hackathon/mafia-gemma-4-12B-it with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for build-small-hackathon/mafia-gemma-4-12B-it to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for build-small-hackathon/mafia-gemma-4-12B-it to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for build-small-hackathon/mafia-gemma-4-12B-it to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="build-small-hackathon/mafia-gemma-4-12B-it",
    max_seq_length=2048,
)

Docker Model Runner
How to use build-small-hackathon/mafia-gemma-4-12B-it with Docker Model Runner:
```
docker model run hf.co/build-small-hackathon/mafia-gemma-4-12B-it
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Mafia Gemma 4 12B IT

Playable Mafia Space | Training Dataset | Base Model | GGUF Q8 Runtime

Mafia Gemma 4 12B IT is a role-conditioned social-deduction fine-tune of unsloth/gemma-4-12b-it. It is designed for seven-player Mafia / Werewolf style games where an agent may be assigned Mafia, Detective, Doctor, or Villager at random and must produce legal game actions plus concise public table messages.

Model Overview

This repository contains the merged Transformers model for mafia-gemma-4-12B-it. The model is intended to be used as the player policy inside a Mafia game agent. In our production game, it is paired with the HOLY GRAIL agent architecture and a separate Time-to-Talk moderator.

The model was trained to improve:

legal JSON/action formatting;
role-conditioned choices for Mafia, Detective, Doctor, and Villager;
public discussion, accusation, defense, claim, and vote behavior;
belief, claim, suspicion, and deception tracking;
night actions such as Mafia kills, Doctor protects, and Detective checks;
compatibility with moderated multi-day Mafia games using classic win conditions.

Training Data

Fine-tuning used the unified Alfaxad/mafia-dataset, a canonical event-log corpus built for social-deduction agents. The dataset combines converted and normalized examples from:

Mini-Mafia style action primitives and role-conditioned decisions;
LLMafia / Time-to-Talk style communication and timing data;
Bayesian Social Deduction / GRAIL belief and role-count examples;
Werewolf / Wolf-Enhance style debate traces after conversion to the Mafia schema;
WOLF-inspired deception, suspicion, claim, and voting labels where available;
our seven-player harness logs with 2 Mafia, 1 Detective, 1 Doctor, and 3 Villagers.

The schema separates public transcript, private role information, legal action sets, hidden state, votes, night actions, claims, and outcome labels. Hidden information is intentionally represented in the training rows only where the acting role is allowed to see it.

Fine-Tuning Recipe

Base model: unsloth/gemma-4-12b-it
Method: LoRA SFT with Unsloth and TRL
LoRA: rank 32, alpha 64
Context length: 4096 tokens
Training sample: 60k train rows
Validation/test sample: 2k validation rows, 2k test rows
Optimizer steps: 1000
Final eval loss: 0.57703
Final train loss: 0.04144
Deployment formats: merged Transformers weights and Q8_0 GGUF

How to Run

Install recent Transformers support for Gemma 4:

pip install -U "transformers>=5.11.0" accelerate torch sentencepiece protobuf

Run a text-only Mafia action prompt:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "Alfaxad/mafia-gemma-4-12B-it"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {
        "role": "system",
        "content": (
            "You are a Mafia game agent. Use only legal public information, "
            "respect your private role, and return compact JSON."
        ),
    },
    {
        "role": "user",
        "content": (
            "Role: Detective. Alive players: Ada, Blake, Casey, Devon, Emery. "
            "Night 2 action: choose one player to investigate."
        ),
    },
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt",
    return_dict=True,
    add_generation_prompt=True,
)
device = next(model.parameters()).device
inputs = {key: value.to(device) for key, value in inputs.items()}

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=1.0,
        top_p=0.95,
        top_k=64,
    )

print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

For the lightweight local runtime, use the GGUF repository: Alfaxad/mafia-gemma-4-12B-it-gguf.

Full-Game Evaluation

The table below combines the merged BF16 and GGUF Q8_0 results as one model family, mafia-gemma-4-12B-it. Results come from 24 full seven-player games, with every player using HOLY GRAIL agent architecture and the non-player moderator fixed to base Gemma 4 12B BF16 using a Time-to-Talk scheduler plus generator. The benchmark included 20 pairwise local-vs-frontier games and 4 mixed all-star games. Total API/player/moderator errors: 0.

Overall Slot Scoreboard

Model	Player slots	Team win rate	Alive final	Avg messages	Avg votes cast	Avg votes received	False claim rate
mafia-gemma-4-12B-it	78	0.615	0.538	1.333	1.936	1.756	0.013
GPT-5 medium	18	0.611	0.722	1.167	1.778	1.222	0.056
GPT-5-mini	18	0.444	0.389	1.444	1.944	2.222	0.000
Claude Opus 4.8	18	0.611	0.444	1.500	2.167	2.556	0.000
Claude Sonnet 4.6	18	0.111	0.333	1.278	1.667	2.667	0.000
Gemini 2.5 Pro OSV	18	0.889	0.556	1.500	2.222	1.889	0.000

Pairwise Local-vs-Frontier Results

mafia-gemma-4-12B-it combines BF16 and Q8_0 rows. Each opponent has four local-side trials: two with Mafia Gemma controlling Mafia and two with Mafia Gemma controlling Good.

Opponent	Local wins overall	Local as Mafia	Local as Good
GPT-5 medium	1/4	1/2	0/2
GPT-5-mini	3/4	1/2	2/2
Claude Opus 4.8	2/4	0/2	2/2
Claude Sonnet 4.6	4/4	2/2	2/2
Gemini 2.5 Pro OSV	1/4	0/2	1/2

Role Diagnostics

Role	Slots	Team win rate	Alive final	Vote accuracy	Avg messages	Avg claims	False claims
Mafia	21	0.381	0.381	1.000	1.476	0.048	0.048
Detective	10	0.700	0.500	0.588	1.100	0.900	0.000
Doctor	14	0.714	0.643	0.700	1.357	0.857	0.000
Villager	33	0.697	0.606	0.661	1.303	0.000	0.000

Runtime Call Metrics

Latency is runtime and provider dependent, so use this as an operational reference rather than a pure capability ranking.

Model	Calls	Avg latency sec	Max latency sec	Avg output chars
mafia-gemma-4-12B-it	104	5.926	103.935	218.2
GPT-5 medium	21	31.484	102.330	229.8
GPT-5-mini	26	5.888	20.425	225.3
Claude Opus 4.8	27	3.228	6.578	249.7
Claude Sonnet 4.6	23	2.835	3.746	240.6
Gemini 2.5 Pro OSV	27	19.449	138.644	143.7

Intended Use

This model is intended for game agents, research harnesses, and AI-native social-deduction experiences. It works best when paired with:

an authoritative game engine that enforces legal actions;
strict public/private view isolation;
role-specific prompts;
an external memory/ledger layer such as HOLY GRAIL;
a moderator that controls turn timing and table flow.

Limitations

The full-game benchmark is small and game-specific. It should not be treated as a broad general reasoning benchmark.
The model is not a security boundary. Hidden-role secrecy must be enforced by the engine and prompt construction.
The model is tuned for Mafia-style game behavior, including in-game deception. Do not use it for real-world deception, impersonation, or misinformation.
Best behavior depends on structured prompts and legal-action validation.

License

This model is distributed under the Apache 2.0 license, following the base Gemma 4 license information provided by the upstream model card.

Downloads last month: 128

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for build-small-hackathon/mafia-gemma-4-12B-it

Base model

google/gemma-4-12B

Finetuned

google/gemma-4-12B-it

Finetuned

unsloth/gemma-4-12b-it

Finetuned

(21)

this model

Quantizations

1 model

build-small-hackathon
/

mafia-gemma-4-12B-it

Mafia Gemma 4 12B IT

Model Overview

Training Data

Fine-Tuning Recipe

How to Run

Full-Game Evaluation

Overall Slot Scoreboard

Pairwise Local-vs-Frontier Results

Role Diagnostics

Runtime Call Metrics

Intended Use

Limitations

License

Model tree for build-small-hackathon/mafia-gemma-4-12B-it

Dataset used to train build-small-hackathon/mafia-gemma-4-12B-it

Space using build-small-hackathon/mafia-gemma-4-12B-it 1