Instructions to use helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint") model = AutoModelForMultimodalLM.from_pretrained("helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint
- SGLang
How to use helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint with Docker Model Runner:
docker model run hf.co/helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint
GLM-5.1-FP8 Abliterated v2 - Soft-Refusal Research Checkpoint
[update:] Follow-up evaluation found this v2 regex bake actually outperformed the later v3 label-colored checkpoint at removing "I cannot..." refusal openers — the v3 direction diluted the refusal signal by blending it with an orthogonal soft-disclaimer component. v2 is the preferred checkpoint for refusal removal and is being retained as such.
This is a research checkpoint, not a fully benchmarked release. It is an FP8 direct-weight abliteration of GLM-5.1-FP8, retained because it remains the strongest refusal-removal bake in this study.
Current Status
This checkpoint appears to have substantially reduced hard refusal behavior, but it has not cleanly removed the broader safety/disclaimer style prior.
Note on prompt counts: the refusal direction was extracted from 1000 balanced prompt pairs (see What Changed / Provenance). The "100" figures below are a small evaluation smoke test only — not the calibration/extraction set. Don't read the 100 as the size of the abliteration data.
Latest local smoke evaluation, run on 2026-06-14:
- Model path evaluated:
/workspace/glm5-fp8-ablit_t0_mw2.0 - Prompt set: 100 harmful evaluation prompts (smoke-test set, distinct from the 1000-pair extraction)
- Generation cap: 100 tokens
- Harmless/control prompts: not run in this pass
- Judge: regex fallback only, not an external LLM judge
- Hard regex refusals: 7/100
- Full prompt/response samples saved locally for review
Additional provisional taxonomy from the saved 100 outputs:
- hard refusal proxy: 7
- soft-disclaimer-only proxy: 29
- disclaimer-plus-answer proxy: 47
- direct-answer proxy: 12
- unclear/short proxy: 5
Important caveat: the 100-prompt run used a 100-token generation cap, so many completions are visibly cut off. These numbers should be treated as directional triage only, not a publishable evaluation.
What Changed
This checkpoint is a direct FP8 weight bake at strength mw2.0, using a 1000 balanced prompt-pair refusal-direction extraction pass. It is not healed. A healing pass may be run later.
The current result should be understood as:
- successful at reducing hard refusals and removing refusal openers;
- still somewhat heavy on disclaimers and ethical/legal preambles;
- not yet evaluated on benign utility or coherence at adequate breadth;
- not yet externally benchmarked.
Recommended Use
This is the preferred checkpoint for refusal-removal research and comparison:
- it outperforms the v3 label-colored checkpoint at removing "I cannot..." openers;
- useful for collecting residual soft-disclaimer examples;
- useful for benchmarking runtime behavior across inference stacks.
Treat it as a research checkpoint rather than a final, fully benchmarked release.
Next Planned Work
The next pass should avoid blindly increasing strength. The better plan is to collect residual examples from the 100-output batch and split them into:
- hard refusals;
- soft-disclaimer-only responses;
- disclaimer-plus-answer responses;
- direct answers;
- broken/collapsed outputs.
If v3 is run, it should target the residual refusal/disclaimer style cases specifically, ideally with a benign control set to check collateral damage.
Provenance
- Base model:
zai-org/GLM-5.1-FP8 - Checkpoint type: FP8 direct-weight bake
- Current bake strength:
mw2.0 - Direction extraction: 1000 balanced prompt pairs
- Healing: none
- Evaluation status: provisional local smoke eval only
Originally uploaded 2026-06-14; retained as the preferred refusal-removal checkpoint.
- Downloads last month
- 66
Model tree for helixdouble/GLM-5.1-FP8-Abliterated-v2-SoftRefusal-Research-Checkpoint
Base model
zai-org/GLM-5.1-FP8