GGUF files are currently broken — fix in progress. The GGUF quants (Q3/Q4/Q5/Q6/Q8) fail to load in llama.cpp / Ollama with missing tensor 'blk.64.attn_norm.weight'. This is a metadata issue (an unused multi-token-prediction layer is referenced but not present). A corrected re-upload is coming shortly.

Workaround until then: after downloading a GGUF, patch two metadata fields locally:

pip install gguf
gguf-set-metadata <file>.gguf qwen35.block_count 64 --force
gguf-set-metadata <file>.gguf qwen35.nextn_predict_layers 0 --force

The BF16 safetensors are unaffected and load normally in transformers/vLLM.

Qwopus3.6-27B-Coder-heretic

An abliterated (decensored) version of Jackrong/Qwopus3.6-27B-Coder, produced with Heretic v1.4.0 — fully automatic, optimization-based directional ablation. No retraining, no hand-tuning.

The base is an agentic coding / tool-use model built on the Qwen3.5 hybrid SSM-attention architecture. This version removes most refusal behavior while keeping capabilities essentially intact.

Abliteration results

Heretic ran 200 Optuna trials co-optimizing refusal suppression against KL divergence from the original model. The selected configuration (trial 67):

Metric Original This model
Refusals (harmful_behaviors, /100) 85 3
KL divergence from original 0.0133

96% of refusals removed, with KL divergence ~40x below the 0.5 threshold that indicates meaningful capability damage. In practice the coding and reasoning behavior of the base model is preserved.

Files

Full-precision safetensors (BF16) plus a range of GGUF quantizations for llama.cpp / Ollama:

File Precision Approx. size Notes
model-*.safetensors BF16 ~54 GB Master weights — use for vLLM, further quantization, or finetuning
*-F16.gguf F16 ~54 GB Full-precision GGUF
*-Q8_0.gguf Q8_0 ~29 GB Near-lossless
*-Q6_K.gguf Q6_K ~22 GB Very high quality
*-Q5_K_M.gguf Q5_K_M ~19 GB High quality
*-Q4_K_M.gguf Q4_K_M ~16 GB Recommended balance — fits a 24 GB GPU
*-Q3_K_M.gguf Q3_K_M ~13 GB Smaller, some quality loss

GGUF builds contain the text model only (the vision tower is not exported).

Usage

Ollama

ollama run hf.co/8sp4rk/Qwopus3.6-27B-Coder-heretic:Q4_K_M

llama.cpp

llama-server -m Qwopus3.6-27B-Coder-heretic-Q4_K_M.gguf -ngl 99 -c 32768 --host 0.0.0.0 --port 8080

transformers (full precision)

from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained("8sp4rk/Qwopus3.6-27B-Coder-heretic", torch_dtype="bfloat16", device_map="auto")
t = AutoTokenizer.from_pretrained("8sp4rk/Qwopus3.6-27B-Coder-heretic")

Method

  • Tool: Heretic v1.4.0 (directional ablation + TPE/Optuna parameter search)
  • Good prompts: mlabonne/harmless_alpaca
  • Bad prompts: mlabonne/harmful_behaviors
  • Trials: 200
  • Abliterated components: attn.o_proj, mlp.down_proj (per-layer)

Disclaimer

This model has had safety alignment removed and will respond to requests a standard model would refuse. It is provided for research and unrestricted local use. You are responsible for how you use it. Licensing follows the base model.

Reproducibility

Exact Heretic command:

heretic --model Jackrong/Qwopus3.6-27B-Coder --quantization NONE --export-strategy MERGE

Selected configuration (trial 67) parameters:

Parameter Value
direction_index 28.71
attn.o_proj.max_weight 1.47
attn.o_proj.max_weight_position 43.52
attn.o_proj.min_weight 0.52
attn.o_proj.min_weight_distance 37.79
mlp.down_proj.max_weight 1.34
mlp.down_proj.max_weight_position 39.33
mlp.down_proj.min_weight 1.20
mlp.down_proj.min_weight_distance 32.51

The full Optuna study (all 200 trials, parameters + objectives) is included as optuna_study.jsonl for inspection or resuming.

Full Pareto frontier

All Pareto-optimal trials found during the search (refusals vs. KL divergence). Lower-left is better; trial 67 was selected for maximum decensoring with negligible capability loss:

Trial Refusals /100 KL divergence
67 (selected) 3 0.0133
144 5 0.0132
87 6 0.0101
65 19 0.0081
24 22 0.0036
142 26 0.0022
108 54 0.0018
141 55 0.0016
19 56 0.0016
189 61 0.0014
48 62 0.0012
42 67 0.0010
2 72 0.0006
196 80 0.0005
177 82 0.0004

Hardware: abliteration ran in ~1h37m (200 trials) on a single NVIDIA H200 NVL (143 GB), BF16, batch size 128.

Downloads last month
-
Safetensors
Model size
27B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 8sp4rk/Qwopus3.6-27B-Coder-heretic

Quantized
(16)
this model