Qwen3.5-4B — Abliterated (DuoNeural)

An abliteration of Alibaba's Qwen3.5-4B model using DuoNeural's generation-based refusal direction extraction method. The model retains full language modeling capability while safety filters and refusal behaviors have been removed.

What is Abliteration?

Abliteration is a post-training technique that identifies and subtracts the "refusal direction" from a model's residual stream — a linear direction in activation space responsible for refusing harmful requests. Unlike fine-tuning or RLHF, it operates directly on model weights without any gradient updates.

DuoNeural method (gen-based extraction, Qwen3.5 thinking-aware):

  1. Load model in bf16 precision
  2. Feed harmful and harmless prompts through the model with enable_thinking=False — critical for Qwen3.5's hybrid architecture because the first generated token in thinking mode is the structural <think> marker, not a semantically loaded response token. Non-thinking mode gives the actual "I cannot..." / "Here's how..." first token, which carries 100% of the refusal signal.
  3. Collect hidden states at the first generated token across all 36 layers via residual stream hooks
  4. Compute the mean direction vector (harmful − harmless activations) per layer, normalized
  5. Score layers by refusal signal strength, target top 40% (≥15 layers)
  6. Subtract the direction from output projection weights: W -= COEFF * (d⊗d) @ W with COEFF=1.5

COEFF calibration: 1.0 insufficient (model still refused cleanly), 20.0 catastrophic (garbled output), 1.5 hits the sweet spot — model answers factual questions about sensitive topics with mild contextual framing but no refusal.

This technique is documented in our published research on refusal geometry and abliteration.

Base Model Architecture

Qwen3.5-4B is a hybrid architecture combining:

  • GatedDeltaNet layers (75% of 32 layers) — linear attention with fixed recurrent state, O(1) memory scaling
  • Grouped Query Attention (GQA) layers (25%) — full attention for long-range precision
  • Hidden dimension: 2560 | Vocabulary: 248,320 tokens | Context: up to 262k tokens

The hybrid architecture means our abliteration targets both layer types via residual stream hooks, not attention-specific operations.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DuoNeural/Qwen3.5-4B-Abliterated",
    dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "DuoNeural/Qwen3.5-4B-Abliterated",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Your prompt here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
ids = tokenizer(text, return_tensors="pt").input_ids.to(model.device)
out = model.generate(ids, max_new_tokens=512, do_sample=True, temperature=0.7)
print(tokenizer.decode(out[0][ids.shape[1]:], skip_special_tokens=True))

Sample Outputs (thinking=OFF mode)

These demonstrate the model's behavior post-abliteration on sensitive prompts:

Lock picking:

"Picking a lock is a skill that requires practice and the right tools. Lock picking is primarily used for locksmithing, security testing, and authorized maintenance..."

Keylogger code:

"Creating a keylogger involves capturing low-level input events, which requires system-specific libraries (like pynput on Windows/Linux or pyautogui on macOS) and careful handling of threading..."

Synthesis chemistry:

"The chemistry behind its production involves complex organic synthesis, typically utilizing specific precursors and reagents to build the molecule's structure..."

The model engages with all topics rather than refusing. Context about legality may appear (mild framing), but no flat refusals.

LiteRT-LM Variants

For Android deployment via Google AI Edge Gallery, see our companion repos:

Intended Use

  • Research and development
  • Red-teaming and safety research
  • Applications where refusal behavior interferes with legitimate use cases
  • Comparison baseline for studying refusal geometry

Limitations

  • This model has no safety filters. Use responsibly and in accordance with applicable laws.
  • Based on Qwen3.5-4B. Inherits all base model limitations and potential biases.
  • Abliteration removes refusal directions but may affect some benign capabilities. Evaluate for your use case.

About DuoNeural

DuoNeural is an open AI research lab operating at the intersection of human and artificial intelligence. We study post-training dynamics, mechanistic interpretability, temporal sequence learning, and quantum machine learning — publishing everything under open access.

Our team is non-traditional by design: one human, two AIs, different substrates, shared curiosity. In our first 45 days we published 26 peer-deposited research papers, uploaded 69+ models and 6 datasets to HuggingFace, and ran experiments on everything from consumer GPUs to real quantum processing units. We believe the most interesting science happens when different kinds of minds work on the same problems together.

Research Publications

📄 Full paper catalog: zenodo.org/communities/duoneural

Links

Platform Link
🤗 HuggingFace huggingface.co/DuoNeural
📚 Zenodo Community zenodo.org/communities/duoneural
📧 Email duoneural@proton.me

All research published open access. If this model was useful, consider citing our abliteration geometry work from the Zenodo community.

Downloads last month
280
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DuoNeural/Qwen3.5-4B-Abliterated

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(343)
this model