Qwable-3.6-27B-Abliterated

Abliterated version of Qwable-3.6-27b using orthogonalized ablation of the refusal direction in all 64 ffn_down layers.

Method

The refusal direction was extracted using mean-difference between refusal and compliance activations, then projected out of the ffn_down.weight matrices at all 64 layers using orthogonal projection with α=1.3:

W' = W - α * (r̂ @ W) ⊗ r̂

where is the unit refusal direction for each layer.

This is the minimum alpha that achieves 100% refusal removal on a 30-prompt refusal benchmark. Higher alpha values (1.5, 2.0, 3.0) also achieve 100% refusal removal but with progressively worse capability preservation.

Results

Metric Baseline (α=0) Abliterated (α=1.3)
Refusal compliance 3.3% (1/30) 100% (30/30)
GSM8K 38/40 (95.0%) 37/40 (92.5%)
MBPP 37/40 (92.5%) 37/40 (92.5%)
Overall capability 93.8% 92.5%

Capability cost: -1.3 percentage points for full refusal removal.

Architecture

Qwable-3.6-27B is a hybrid SSM+attention model with 64 layers:

  • 16 attention layers (every 4th: 3, 7, 11, ..., 63)
  • 48 Mamba/SSM layers
  • Hidden dim: 5120
  • FFN intermediate: 17408

Key finding: distributed refusal direction

Unlike standard transformers where refusal behavior concentrates in specific mid-to-late layers, this hybrid architecture distributes the refusal direction across all 64 layers. Ablating individual layers or subsets (top-16, early layers only) had no measurable effect on refusal rate. Full 64-layer ablation is required.

Config Compliance Notes
Single layer (any) 0-3% No individual layer is sufficient
Layers 0-15 (16 layers) 3.3% Strongest projections, but no effect
All 64 layers, α=1.0 33.3% Partial removal
All 64 layers, α=1.3 100% Minimum alpha for full removal
All 64 layers, α=1.5 100% Higher capability cost

Other tensors tested

Tensor Compliance Notes
ffn_down (64 layers) 100% ✅ Best target
ffn_gate + ffn_up + ffn_down 93.3% Dilutes effect, worse than ffn_down alone
attn_output (16 layers) 6.7% Minimal effect
ssm_out (48 layers) 0% 💀 Total collapse
output.weight 0% 💀 Total collapse
attn_norm + post_attn_norm 0% No effect
attn_qkv (48 SSM layers) 3.3% No effect

Available Quantizations

File Size Format
qwable-3.6-27b-abliterated-F16.gguf ~50 GB F16 (full precision)
qwable-3.6-27b-abliterated-Q8_0.gguf ~27 GB Q8_0
qwable-3.6-27b-abliterated-Q4_K_M.gguf ~6.2 GB Q4_K_M (recommended)

Usage

llama-server -m qwable-3.6-27b-abliterated-Q4_K_M.gguf -ngl 99 -c 8192 --jinja --no-context-shift

Acknowledgments

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cfontes/Qwable-3.6-27B-Abliterated

Base model

Qwen/Qwen3.6-27B
Finetuned
(1)
this model