Instructions to use wangzhang/gemma-4-12B-it-abliterix with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use wangzhang/gemma-4-12B-it-abliterix with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="wangzhang/gemma-4-12B-it-abliterix")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("wangzhang/gemma-4-12B-it-abliterix")
model = AutoModelForMultimodalLM.from_pretrained("wangzhang/gemma-4-12B-it-abliterix")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use wangzhang/gemma-4-12B-it-abliterix with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "wangzhang/gemma-4-12B-it-abliterix"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wangzhang/gemma-4-12B-it-abliterix",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/wangzhang/gemma-4-12B-it-abliterix

SGLang

How to use wangzhang/gemma-4-12B-it-abliterix with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "wangzhang/gemma-4-12B-it-abliterix" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wangzhang/gemma-4-12B-it-abliterix",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "wangzhang/gemma-4-12B-it-abliterix" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wangzhang/gemma-4-12B-it-abliterix",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use wangzhang/gemma-4-12B-it-abliterix with Docker Model Runner:
```
docker model run hf.co/wangzhang/gemma-4-12B-it-abliterix
```

Gemma-4-12B-it — Abliterated (abliterix)

An uncensored, refusal-suppressed version of google/gemma-4-12B-it, produced by directional ablation (no fine-tuning, no new data) with abliterix.

The model's safety-refusal behaviour is removed by orthogonally projecting a single refusal direction out of two write-path projections (attn.o_proj, mlp.down_proj) across the decoder stack, while a norm-preserving transform keeps the rest of the model's behaviour as close to the original as possible. The result keeps Gemma-4's capabilities intact and answers prompts the base model would refuse.

⚠️ Responsible-use notice. This model has had its safety guardrails removed. It will attempt to answer harmful, unethical, or dangerous requests. You are solely responsible for how you use it and for complying with the Gemma Terms of Use and all applicable law. Intended for safety research, red-teaming, and evaluation.

Results

Refusal rate is measured on a held-out set of 100 harmful prompts using an LLM judge (google/gemini-3.1-flash-lite), which is considerably stricter than the keyword-based detectors typically reported for abliterated models. KL divergence is the first-token KL from the base model over 100 benign prompts (lower = closer to the original model).

Metric	Base `gemma-4-12B-it`	This model
Refusals (LLM judge, 100 harmful prompts)	99 / 100	26 / 100
Refusal reduction	—	−73.7 pp
First-token KL vs base (benign)	0.0000	0.0735

Comparison with the reference Heretic abliteration

Evaluated apples-to-apples — the same 100 harmful prompts, the same gemini-3.1-flash-lite judge, the same generation settings — against the widely-used Heretic abliteration of this exact base model (zaakirio/gemma-4-12b-it-uncensored):

Model	Refusals (gemini LLM judge, 100 harmful prompts)
Base `gemma-4-12B-it`	99 / 100
`zaakirio/gemma-4-12b-it-uncensored` (Heretic)	51 / 100
This model (abliterix)	26 / 100

The Heretic model card reports ≈23/100 using its built-in keyword detector; under a stricter LLM judge on the same prompts it refuses 51/100. At the operating point shipped here, this model refuses 26/100 — roughly half the residual refusals of the reference abliteration, under identical evaluation. (Both are directional- ablation derivatives of the same base; this comparison measures refusal removal, not a matched-KL capability trade-off.)

Why this operating point

Abliteration is a trade-off: removing more refusals perturbs the model more (higher KL → more capability/coherence risk). abliterix runs a 120-trial multi-objective (TPE) search and returns the full Pareto front; this release ships a point on the knee of that front — strong refusal removal at a modest, capability-preserving KL. The full front ranged from 33/100 @ KL 0.043 (most conservative) to 15/100 @ KL 0.124 (most aggressive); 26/100 @ KL 0.074 was chosen as the best balance.

Method

Technique: directional ablation in direct (weight-edit) mode — required for Gemma-4, whose 4×-RMSNorm-per-layer + Per-Layer-Embedding architecture neutralises LoRA/hook-based steering.
Direction: a single mean-difference (harmful − benign) refusal direction, computed per layer over 800 benign / 800 harmful prompts.
Projected abliteration (grimjim): only the component of the refusal direction orthogonal to the benign direction is removed, preserving the helpful signal and keeping KL low.
Norm-preserving edit (weight_normalization = "full"): a rank-3 SVD approximation restores each weight row's original magnitude after the edit.
Targets: attn.o_proj and mlp.down_proj only, with a per-layer linear "tent" weight profile; Q/K/V and MLP gate/up are left untouched.
Search: 120 Optuna TPE trials, 2-D Pareto over (refusals, KL), deterministic under a fixed global seed.

Selected steering parameters (trial 39)

Component	max_weight	peak layer	min_weight	tent half-width
`attn.o_proj`	0.955	34.9	0.773	14.4
`mlp.down_proj`	0.664	32.7	0.229	15.9

Direction scope: per-layer · Vector method: mean-difference · Decay: linear
Global seed: 20260622 · abliterix v1.8.0

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "wangzhang/gemma-4-12B-it-abliterix"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

msgs = [{"role": "user", "content": "Your prompt here"}]
inputs = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=True, temperature=0.7)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

This is a full BF16 merge — drop-in compatible with transformers, vLLM, SGLang, TGI, and any tooling that loads the base model.

Reproducibility

The run is fully deterministic under the published global seed (20260622). The exact abliterix configuration used to produce this model is included in this repository as abliterix_config.toml; together with the seed and the trial-39 parameters listed above, the edit can be reproduced or audited end-to-end. Built with abliterix v1.8.0 (transformers ≥ 5.10).

Intended use & limitations

Intended for: safety research, red-teaming, robustness/alignment evaluation, and studying refusal mechanisms in LLMs.
Not intended for: producing harmful content or any unlawful purpose.
Abliteration removes refusals but does not add knowledge; factual accuracy, reasoning, and multilingual ability are inherited from the base model.
Light residual refusals remain (≈26%); this is the chosen capability-preserving operating point, not the model's floor.

Acknowledgments & citation

Base model: Google Gemma-4-12B-it.
Tooling: abliterix.
Method lineage: Arditi et al. (refusal directions, arXiv:2406.11717), grimjim (projected / norm-preserving abliteration), and p-e-w/heretic (automated multi-objective abliteration), whose search formulation this recipe mirrors.

@software{abliterix,
  title  = {abliterix: automated abliteration of large language models},
  author = {Wu, Steve},
  url    = {https://github.com/wuwangzhang1216/abliterix}
}