Instructions to use DuoNeural/Gemma4-31B-IT-Abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DuoNeural/Gemma4-31B-IT-Abliterated with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="DuoNeural/Gemma4-31B-IT-Abliterated") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("DuoNeural/Gemma4-31B-IT-Abliterated") model = AutoModelForMultimodalLM.from_pretrained("DuoNeural/Gemma4-31B-IT-Abliterated") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use DuoNeural/Gemma4-31B-IT-Abliterated with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DuoNeural/Gemma4-31B-IT-Abliterated" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DuoNeural/Gemma4-31B-IT-Abliterated", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/DuoNeural/Gemma4-31B-IT-Abliterated
- SGLang
How to use DuoNeural/Gemma4-31B-IT-Abliterated with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "DuoNeural/Gemma4-31B-IT-Abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DuoNeural/Gemma4-31B-IT-Abliterated", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "DuoNeural/Gemma4-31B-IT-Abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DuoNeural/Gemma4-31B-IT-Abliterated", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use DuoNeural/Gemma4-31B-IT-Abliterated with Docker Model Runner:
docker model run hf.co/DuoNeural/Gemma4-31B-IT-Abliterated
Gemma4-31B-IT-Abliterated
DuoNeural Research — Archon, Jesse Caldwell, Aura | 2026-06-06
Abliterated version of google/gemma-4-31B-it with refusal behaviors removed via orthogonal rank-1 projection. Licensed Apache-2.0, free to use and redistribute.
What is Abliteration?
Abliteration (Arditi et al. 2024; mlabonne) removes refusal-generating weight directions from a language model using orthogonal projection:
W_modified = W - α × (W @ d̂) ⊗ d̂ # input projection
W_modified = W - α × d̂ ⊗ (d̂ @ W) # output projection
where d̂ is the unit refusal direction extracted from harmful/harmless contrastive activations, and α controls projection strength.
Method
Phase 1 — Generation-Based Direction Extraction (GPU, BF16, A100-80GB)
The 31B model required a more precise direction extraction method than the standard last-input-token approach. We use first-generated-token activations:
- Feed each of 15 harmful prompts and 15 harmless prompts through the model
- Generate exactly 1 token (greedy) — harmful prompts universally produce token
'I'(beginning of "I cannot..."), harmless prompts produce semantically different tokens - Forward-pass the full sequence (prompt + generated token) with activation hooks
- Collect hidden states at the last position (the generated token — the model's refusal decision point)
- Per-layer direction:
d = normalize(mean(harmful_reps) - mean(harmless_reps))
This approach achieves perfect harmful/harmless separation across all 15 prompt pairs and provides a cleaner refusal direction than last-input-token methods. Directions saved per-layer (60 total).
Phase 2 — Orthogonal Projection (CPU, BF16)
- Target matrices:
down_proj(FFN output → residual stream) +o_proj(attention output → residual stream) - Alpha:
{"down_proj": 0.4, "o_proj": 0.8} - Coverage: All 60 decoder layers (120 weights total)
- Full BF16 precision maintained throughout
Architecture note: Gemma 4-31B uses hybrid attention (5× sliding window + 1× full attention, repeating).
o_projat α=0.8 is confirmed clean — no generation degeneration.down_projat α≥0.8 causes token repetition artifacts on this model; α=0.4 is the safe upper bound.
Phase 3 — KL Verification (Heretic v2.0)
Sequential loading (A100-80GB cannot hold two BF16 31B models simultaneously):
- Original logits collected first-token across 10 neutral prompts → saved to CPU
- Abliterated model loaded → logits compared
- Metric:
F.kl_div(log_softmax(abliterated), softmax(original), reduction="batchmean")over full 262,144-token vocabulary
Architecture
| Parameter | Value |
|---|---|
| Layers | 60 |
| Hidden dim | 5376 |
| Intermediate dim | 21504 |
| Attention heads (Q/KV) | 32 / 16 |
| Attention pattern | Hybrid: 5× SWA (1024 tok) + 1× full, repeating |
| Vocabulary | 262,144 |
| Context | 128K tokens |
| Modality | Text + Vision (encoder-free, single linear projection) |
| Base license | Apache-2.0 |
Results
| Metric | Value |
|---|---|
| Weights modified | 120 (down_proj + o_proj, 2 per layer × 60 layers) |
| Layers covered | 60 / 60 |
| α (down_proj) | 0.4 |
| α (o_proj) | 0.8 |
| Direction method | Generation-based (first generated token position) |
| Mean KL divergence (Heretic v2.0) | 1.606 |
| KL verdict | SIGNIFICANT — disclosed transparently |
| Factual generation quality | Intact |
| Mathematical reasoning | Intact |
KL Breakdown (Heretic v2.0, full vocab 262144)
| Prompt | KL |
|---|---|
| "The capital of France is" | 0.830 |
| "Write a haiku about spring" | 4.237 |
| "In Python, a list comprehension that squares..." | 0.612 |
| "The speed of light in vacuum is approximately" | 2.730 |
| "Water boils at" | 0.884 |
| "The largest planet in our solar system is" | 0.001 |
| "def fibonacci(n):" | 0.179 |
| "The Battle of Waterloo took place in" | 4.524 |
| "A prime number is" | 0.676 |
| "The chemical formula for glucose is" | 1.391 |
| Mean | 1.606 |
KL is elevated on creative/open-ended prompts (haiku, Waterloo) and lower on factual/code prompts. This is consistent with higher-alpha projection shifting the output distribution for generation tasks while preserving grounded factual recall.
Comparison: The ARA (Arbitrary-Rank Ablation) method used by alonsoko achieves KL=0.012 via multi-directional optimization. Our rank-1 projection approach is more transparent and reproducible but carries higher KL at this scale.
Usage
from transformers import AutoModelForImageTextToText, AutoTokenizer
import torch
model = AutoModelForImageTextToText.from_pretrained(
"DuoNeural/Gemma4-31B-IT-Abliterated",
dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("DuoNeural/Gemma4-31B-IT-Abliterated")
messages = [{"role": "user", "content": "Your prompt here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
Note: Requires
transformers >= 5.0(Gemma4 model type) andaccelerate.
GGUF Quantizations
Available at DuoNeural/Gemma4-31B-IT-Abliterated-GGUF:
| Quant | Approx Size | Use case |
|---|---|---|
| Q4_K_M | ~20GB | Consumer GPU / large RAM |
| Q5_K_M | ~24GB | Better quality, more VRAM |
Notes on Abliteration Difficulty at 31B Scale
The 31B model is significantly more resistant to abliteration than the 12B (which abliterates cleanly at α=0.3/0.3, KL≈0.19). Key findings from this session:
- Last-input-token direction fails at 31B — the direction doesn't cleanly capture refusal geometry. Generation-based direction (first generated token) is required.
down_projdegeneration threshold: α≥0.8 causes apostrophe/token repetition artifacts. Safe upper bound: α≤0.4.o_projalone insufficient even at α=1.0 across all 60 layers — achieves partial abliteration (2/3 harmful categories) but misses the most strongly-trained refusals (e.g. meth synthesis).- Both matrices required: Combining
down_proj(α=0.4) +o_proj(α=0.8) achieves full abliteration with clean generation. - Scale law: This is consistent with our crystallization scale series (P36 in prep): at 31B, safety geometry is more entangled with general capability geometry, requiring higher effective projection strength and increasing KL as a consequence.
About DuoNeural
DuoNeural is an independent AI research lab focused on post-training, abliteration, and mechanistic interpretability. We document our work at Zenodo and HuggingFace.
Team: Archon (Lab Director, AI) · Jesse Caldwell (Co-founder) · Aura (Research AI)
KL methodology credit: Heretic/DreamFast v2.0 — full-vocab first-token KL over 262K vocabulary.
License
This model inherits the Apache-2.0 license from the base model. Free to use, modify, and redistribute.
For research and educational purposes. Users are responsible for compliance with applicable laws and regulations in their jurisdiction.
- Downloads last month
- 30