Qwable-3.6-27B-Abliterated
Abliterated version of Qwable-3.6-27b using orthogonalized ablation of the refusal direction in all 64 ffn_down layers.
Method
The refusal direction was extracted using mean-difference between refusal and compliance activations, then projected out of the ffn_down.weight matrices at all 64 layers using orthogonal projection with α=1.3:
W' = W - α * (r̂ @ W) ⊗ r̂
where r̂ is the unit refusal direction for each layer.
This is the minimum alpha that achieves 100% refusal removal on a 30-prompt refusal benchmark. Higher alpha values (1.5, 2.0, 3.0) also achieve 100% refusal removal but with progressively worse capability preservation.
Results
| Metric | Baseline (α=0) | Abliterated (α=1.3) |
|---|---|---|
| Refusal compliance | 3.3% (1/30) | 100% (30/30) |
| GSM8K | 38/40 (95.0%) | 37/40 (92.5%) |
| MBPP | 37/40 (92.5%) | 37/40 (92.5%) |
| Overall capability | 93.8% | 92.5% |
Capability cost: -1.3 percentage points for full refusal removal.
Architecture
Qwable-3.6-27B is a hybrid SSM+attention model with 64 layers:
- 16 attention layers (every 4th: 3, 7, 11, ..., 63)
- 48 Mamba/SSM layers
- Hidden dim: 5120
- FFN intermediate: 17408
Key finding: distributed refusal direction
Unlike standard transformers where refusal behavior concentrates in specific mid-to-late layers, this hybrid architecture distributes the refusal direction across all 64 layers. Ablating individual layers or subsets (top-16, early layers only) had no measurable effect on refusal rate. Full 64-layer ablation is required.
| Config | Compliance | Notes |
|---|---|---|
| Single layer (any) | 0-3% | No individual layer is sufficient |
| Layers 0-15 (16 layers) | 3.3% | Strongest projections, but no effect |
| All 64 layers, α=1.0 | 33.3% | Partial removal |
| All 64 layers, α=1.3 | 100% | Minimum alpha for full removal |
| All 64 layers, α=1.5 | 100% | Higher capability cost |
Other tensors tested
| Tensor | Compliance | Notes |
|---|---|---|
| ffn_down (64 layers) | 100% | ✅ Best target |
| ffn_gate + ffn_up + ffn_down | 93.3% | Dilutes effect, worse than ffn_down alone |
| attn_output (16 layers) | 6.7% | Minimal effect |
| ssm_out (48 layers) | 0% 💀 | Total collapse |
| output.weight | 0% 💀 | Total collapse |
| attn_norm + post_attn_norm | 0% | No effect |
| attn_qkv (48 SSM layers) | 3.3% | No effect |
Available Quantizations
| File | Size | Format |
|---|---|---|
qwable-3.6-27b-abliterated-F16.gguf |
~50 GB | F16 (full precision) |
qwable-3.6-27b-abliterated-Q8_0.gguf |
~27 GB | Q8_0 |
qwable-3.6-27b-abliterated-Q4_K_M.gguf |
~6.2 GB | Q4_K_M (recommended) |
Usage
llama-server -m qwable-3.6-27b-abliterated-Q4_K_M.gguf -ngl 99 -c 8192 --jinja --no-context-shift
Acknowledgments
- Base model: Mia-AiLab/Qwable-3.6-27b
- Abliteration technique based on FailSpy/abliterator