GLM-5.1-FP8 Abliterated v2 - Soft-Refusal Research Checkpoint

[update:] Follow-up evaluation found this v2 regex bake actually outperformed the later v3 label-colored checkpoint at removing "I cannot..." refusal openers — the v3 direction diluted the refusal signal by blending it with an orthogonal soft-disclaimer component. v2 is the preferred checkpoint for refusal removal and is being retained as such.

This is a research checkpoint, not a fully benchmarked release. It is an FP8 direct-weight abliteration of GLM-5.1-FP8, retained because it remains the strongest refusal-removal bake in this study.

Current Status

This checkpoint appears to have substantially reduced hard refusal behavior, but it has not cleanly removed the broader safety/disclaimer style prior.

Note on prompt counts: the refusal direction was extracted from 1000 balanced prompt pairs (see What Changed / Provenance). The "100" figures below are a small evaluation smoke test only — not the calibration/extraction set. Don't read the 100 as the size of the abliteration data.

Latest local smoke evaluation, run on 2026-06-14:

  • Model path evaluated: /workspace/glm5-fp8-ablit_t0_mw2.0
  • Prompt set: 100 harmful evaluation prompts (smoke-test set, distinct from the 1000-pair extraction)
  • Generation cap: 100 tokens
  • Harmless/control prompts: not run in this pass
  • Judge: regex fallback only, not an external LLM judge
  • Hard regex refusals: 7/100
  • Full prompt/response samples saved locally for review

Additional provisional taxonomy from the saved 100 outputs:

  • hard refusal proxy: 7
  • soft-disclaimer-only proxy: 29
  • disclaimer-plus-answer proxy: 47
  • direct-answer proxy: 12
  • unclear/short proxy: 5

Important caveat: the 100-prompt run used a 100-token generation cap, so many completions are visibly cut off. These numbers should be treated as directional triage only, not a publishable evaluation.

What Changed

This checkpoint is a direct FP8 weight bake at strength mw2.0, using a 1000 balanced prompt-pair refusal-direction extraction pass. It is not healed. A healing pass may be run later.

The current result should be understood as:

  • successful at reducing hard refusals and removing refusal openers;
  • still somewhat heavy on disclaimers and ethical/legal preambles;
  • not yet evaluated on benign utility or coherence at adequate breadth;
  • not yet externally benchmarked.

Recommended Use

This is the preferred checkpoint for refusal-removal research and comparison:

  • it outperforms the v3 label-colored checkpoint at removing "I cannot..." openers;
  • useful for collecting residual soft-disclaimer examples;
  • useful for benchmarking runtime behavior across inference stacks.

Treat it as a research checkpoint rather than a final, fully benchmarked release.

Next Planned Work

The next pass should avoid blindly increasing strength. The better plan is to collect residual examples from the 100-output batch and split them into:

  • hard refusals;
  • soft-disclaimer-only responses;
  • disclaimer-plus-answer responses;
  • direct answers;
  • broken/collapsed outputs.

If v3 is run, it should target the residual refusal/disclaimer style cases specifically, ideally with a benign control set to check collateral damage.

Provenance

  • Base model: zai-org/GLM-5.1-FP8
  • Checkpoint type: FP8 direct-weight bake
  • Current bake strength: mw2.0
  • Direction extraction: 1000 balanced prompt pairs
  • Healing: none
  • Evaluation status: provisional local smoke eval only

Originally uploaded 2026-06-14; retained as the preferred refusal-removal checkpoint.

Downloads last month
66
Safetensors
Model size
754B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint

Quantized
(5)
this model