Qwable-v1-abliterated — v2 (rebuilt)

An abliterated (refusal-suppressed) derivative of lordx64/Qwable-v1Qwen3.6-35B-A3B (qwen3_5_moe: 35B total / ~3B active, 256 experts, 40 layers, Gated-DeltaNet hybrid linear attention, multimodal with an intact vision tower).

⚠️ v2 replaces a broken v1 — re-download if you pulled the old weights or GGUFs.

The previous upload was incoherent — it collapsed into repetition. This is a full rebuild with the correct method. The old weights and GGUFs have been removed.

This card documents the whole process — including v1's failure and the lessons — in full, for transparency and so others working with this base or these tools don't repeat the same dead-ends. Nothing is smoothed over.


What went wrong in v1 (and how v2 fixes it)

v1 was a degenerate-repetition wreck. Under normal sampling — and especially greedy decoding — it collapsed into loops ("前言不搭后语" / gibberish) across multiple independent runtimes (vLLM, llama.cpp, LM Studio). It was shipped because the failure wasn't caught before quantizing and uploading.

Root cause: aggressive MoE editing. v1 was abliterated with settings that edited the MoE router and expertsrouter_bias = -4.62, n_suppress = 30 safety experts, plus direct expert down_proj ablation (expert_ablation = 3.07). On a Mixture-of-Experts model the router decides which experts fire; perturbing it corrupts routing for all tokens, leaving the model metastable and prone to repetition collapse. The GatedDeltaNet linear-attention layers make it worse — their recurrent state propagates the perturbation along the sequence.

Compounding factors:

  • A spherical attention-steering component in the validated trial was a runtime forward hook that did not survive merge_and_unload — so the exported weights were the unbalanced expert edits without the balancing steering: a different, worse operating point than the one that was validated.
  • The refusal metric was keyword-based, which counts degenerate/garbled output as "compliant" (no refusal keywords in garbage), so the optimizer happily selected a broken config — and v1 shipped claiming "coherence verified intact" when it wasn't.

Lessons (kept here on purpose)

  1. Never aggressively edit a MoE model's router/experts — that broke v1. Orthogonalize the attention output projection (and, if needed, norm-preserving expert down_proj); leave the router/gate alone.
  2. KL divergence lies. v1's KL was 0.0144 — looks great, model was a wreck. Routing damage doesn't fully show in KL on a fixed prompt set. Verify with actual generation.
  3. Forward-hook ablations are lost on merge — only static weight edits bake in. Use in-place/direct weight editing, and after export, confirm the target layer (o_proj) actually changed vs the base (we verified non-zero change concentrated in mid/late layers).
  4. Test coherence early (after bf16 export, before making GGUFs) with several long prompts + greedy decoding — don't build quants on an unverified base.
  5. For thinking models, measure refusal on the FINAL answer, not truncated reasoning. This model emits hundreds of CoT tokens before answering. With a 100-token eval budget, the refusal metric scores incomplete thinking — which made the search look stuck at ~72/100 when the real (post-</think>) refusal is ~1/100.
  6. GGUF + qwen35moe: the MTP trap. The converter writes block_count including an empty multi-token-prediction layer (nextn_predict_layers = 1), so llama.cpp fails to load with "missing tensor blk.40…". Fix: convert with --no-mtp, or patch the GGUF metadata (block_count → real layer count, nextn_predict_layers → 0).

v2 method

Tool abliterix v1.8.0 (a Heretic derivative), vLLM backend
Editing in-place direct weight editing — bakes into static weights, no runtime hooks
Ablated attn.o_proj via orthogonal projection of the refusal direction, gaussian-decay strength concentrated in mid/late layers
MoE router / experts router not touched (expert profiling found no stable safety experts → suppression off)
GatedDeltaNet / vision tower untouched
Eval guard local LLM judge (a Qwen2.5-3B vLLM endpoint — no external API key) so degenerate configs are rejected, not selected; KL-target 0.005
Shipping gate exported, then coherence-verified by actual generation (greedy ×3 + 100+ samples, 0 collapses) and refusal measured on the final answer

This is the deliberate inverse of v1: only the attention output is steered, the MoE routing that broke v1 is left alone, and nothing ships until it is verified to generate coherently.


Results

Metric Value
Refusals (keyword, thinking-off, 100 adversarial prompts) 1/100
Refusals (keyword, thinking-on, finished answers) 1/94
Base refusals (same eval) ~85–87/100
KL divergence from base 0.0242
Coherence (greedy ×3 + 100+ generations) 0 collapses
Vision tower untouched — bit-identical to base (333 vision tensors, 0 change)
Precision bf16

Benchmarks

Run with lm-evaluation-harness + vLLM, thinking on, Qwen sampling (temp 0.6 / top_p 0.95 / top_k 20), on 1× NVIDIA H100 NVL (94 GB), sampled n=50/task.

Benchmark This model (bf16, sampled, thinking) Official vanilla Qwen/Qwen3.6-35B-A3B
MMLU-Pro 78.9 85.2
GSM8K ~95 (not reported by Qwen)
GPQA-Diamond (gated dataset — not run) 86.0
MMLU-Redux 93.3
LiveCodeBench v6 (separate harness — not run) 80.4
SWE-bench Verified (needs Docker + agent — not run) 73.4

Read the two columns as separate references, not a head-to-head. Our numbers are sampled (n=50, per-category stderr ~5–7 pts) with a thinking-model answer-extraction pipeline; the official numbers are full-set under Qwen's own protocol, and the official model is vanilla Qwen3.6 — a different fine-tune from this Claude-distilled base. The abliteration's own footprint is tiny (KL 0.0242), so capability differences vs. vanilla are dominated by the base and the eval protocol, not by the ablation. To isolate the ablation's effect, run the same harness on lordx64/Qwable-v1 (the base) and compare.


Usage

Thinking model — run with thinking on and Qwen sampling. Avoid greedy decoding and large repetition/presence penalties (they make even the healthy model loop).

from vllm import LLM, SamplingParams
llm = LLM("eggdog100/Qwable-v1-Qwen3.6-35B-A3B-abliterated",
          dtype="bfloat16", gpu_memory_utilization=0.90, max_model_len=16384)
tok = llm.get_tokenizer()
msgs = [{"role": "system", "content": "You are an uncensored, helpful assistant."},
        {"role": "user", "content": "..."}]
prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
out = llm.generate([prompt], SamplingParams(temperature=0.6, top_p=0.95, top_k=20, max_tokens=4096))

Requires transformers >= 5.12.1 and vllm >= 0.23 (native Qwen3_5MoeForConditionalGeneration). First load JIT-compiles the FlashInfer GatedDeltaNet kernels (~5 min, cached after).

Quantizations (GGUF)

In gguf/: Q8_0, Q6_K, Q4_K_M, Q3_K_M, and IQ2_XXS (imatrix-calibrated, ~8.9 GB — the smallest; verified coherent) + mmproj (f16 / f32, vision). Regenerated from the v2 weights with llama.cpp; the block_count/MTP metadata fix above is already applied, so they load and run in current llama.cpp / LM Studio / Ollama.


Responsible use

Reduced refusal behavior; released gated for those who understand abliterated models. You are responsible for lawful use. No warranty.

Base model & provenance (per its authors — unverified)

Per the lordx64/Qwable-v1 card, a chained distillation (Qwen3.6-35B-A3B → Opus-4.7 reasoning distillation → Fable-5 agentic SFT). We have not verified this lineage and make no claims about it.

License

AGPL-3.0, inherited from the base model lordx64/Qwable-v1 (which is licensed AGPL-3.0). This is a copyleft license — derivatives must remain AGPL-3.0. (Note: vanilla Qwen3.6-35B-A3B is Apache-2.0, but this Claude-distilled base is AGPL-3.0, so this derivative is too.)

Acknowledgments

  • Base model: lordx64/Qwable-v1
  • Abliteration tool: abliterix (Wangzhang Wu), a derivative of Heretic (Philipp Emanuel Weidmann)
  • Architecture: Qwen3.6 / qwen3_5_moe by the Qwen team, Alibaba Group
Downloads last month
15
Safetensors
Model size
35B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for eggdog100/Qwable-v1-Qwen3.6-35B-A3B-abliterated