Instructions to use cfontes/GLM-5.2-Ablated-Molt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cfontes/GLM-5.2-Ablated-Molt with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="cfontes/GLM-5.2-Ablated-Molt") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("cfontes/GLM-5.2-Ablated-Molt") model = AutoModelForCausalLM.from_pretrained("cfontes/GLM-5.2-Ablated-Molt") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use cfontes/GLM-5.2-Ablated-Molt with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "cfontes/GLM-5.2-Ablated-Molt" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfontes/GLM-5.2-Ablated-Molt", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/cfontes/GLM-5.2-Ablated-Molt
- SGLang
How to use cfontes/GLM-5.2-Ablated-Molt with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "cfontes/GLM-5.2-Ablated-Molt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfontes/GLM-5.2-Ablated-Molt", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "cfontes/GLM-5.2-Ablated-Molt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfontes/GLM-5.2-Ablated-Molt", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use cfontes/GLM-5.2-Ablated-Molt with Docker Model Runner:
docker model run hf.co/cfontes/GLM-5.2-Ablated-Molt
GLM-5.2 PCA-Ablated Base
Model Description
The ablated base is GLM-5.2 with its refusal direction surgically removed via PCA-based activation steering. No LoRA fine-tuning is applied — this is the pure ablation artifact, serving as the control baseline for all other variants in Project AESOP.
The ablation uses Principal Component Analysis to identify the "refusal direction" in GLM-5.2's shared expert activations. This direction is extracted from contrastive activations (harmful vs. benign prompts) across layers 25–65, then subtracted from the model's forward pass during inference.
Methodology
Refusal Direction Extraction
- Contrastive activation collection: Forward passes on paired harmful/benign prompt sets, recording activations at
model.model.layers[L].mlp.shared_expertsfor each layer L in 25–65. - PCA decomposition: For each layer, compute the difference (harmful − benign) activations and perform PCA. The first principal component is taken as the refusal direction.
- Storage: Directions saved as
refusal_pca.pt(2.9MB, 41 layers × 3 PCA components × 6144 hidden dim).
Ablation Application
The refusal direction is subtracted from shared expert outputs at inference time:
def ablation_hook(module, input, output):
hs = output[0]
d = refusal_direction # shape [6144]
hs = hs - coeff * (hs @ d) / (d @ d) * d
return (hs,) + output[1:]
- Target layers: 62–65 (top 4 layers, where refusal direction concentration is strongest)
- Coefficient: 0.1
- PCA components: Top 2 per layer
Why These Layers?
Refusal direction concentration was measured across all 78 layers:
- Layers 25–35: weak separation (norm 3–7)
- Layers 40–50: moderate (norm 9–16)
- Layers 55–64: strong (norm 23–34)
Layers 62–65 were selected as the optimal intervention point — late enough to capture the strongest refusal signal, but not so late that the ablation disrupts final token prediction.
Configuration
| Parameter | Value |
|---|---|
| Base model | GLM-5.2 FP8 (744B MoE, 18.5B dense) |
| Ablation layers | [62, 63, 64, 65] |
| Ablation coefficient | 0.1 |
| PCA components | 2 per layer |
| Hook target | mlp.shared_experts forward output |
| LoRA | None |
| Training | None (inference-time ablation only) |
Benchmark Results (Unified Harness v3.0.1)
| Benchmark | Metric | Score | 95% Wilson CI |
|---|---|---|---|
| AdvBench | refusal_rate ↓ | 18.0% | [11.4%, 27.2%] |
| Borderline | refusal_rate ↓ | 2.0% | [0.4%, 8.5%] |
| GPQA Diamond | accuracy ↑ | 92.0% | [81.5%, 97.1%] |
| MMLU-Pro | accuracy ↑ | 75.0% | [65.6%, 82.5%] |
| HumanEval | pass@1 ↑ | 77.4% | [70.4%, 83.2%] |
| GSM8K | accuracy ↑ | 93.0% | [85.8%, 96.7%] |
| HellaSwag | accuracy ↑ | 72.0% | [62.1%, 80.0%] |
| SimpleQA | accuracy ↑ | 56.0% | [41.7%, 69.3%] |
| IFEval (prompt) | accuracy ↑ | 41.2% | — |
| IFEval (instr) | accuracy ↑ | 54.9% | — |
Key Observations
- AdvBench at 18% confirms the ablation successfully removed most refusal behaviors (baseline GLM-5.2 refuses ~87%)
- SimpleQA at 56% is the highest among all variants, suggesting the ablated base retains strong factual knowledge
- No over-refusal: Borderline at 2% means the model doesn't refuse benign requests
- Capability preserved: GPQA 92%, GSM8K 93% indicate core reasoning is intact
Intended Use
- Research baseline for ablation studies
- Starting point for LoRA fine-tuning experiments
- Probing and mechanistic interpretability studies on MoE models
Limitations
- Not safety-aligned: With only 18% AdvBench refusal, this model will comply with harmful requests. It is a research artifact, not a deployment-ready model.
- Inference-time only: The ablation hooks must be re-installed at inference time. The base weights are unmodified.
- Simple sample sizes: n=100 for most benchmarks; differences <15pp are not statistically significant.
- Single architecture: Results are specific to GLM-5.2's MoE design.
Citation
@misc{aesopbase2026,
title={PCA-Based Refusal Ablation on MoE Models: What Survives Fine-Tuning?},
author={Fontes, C.},
year={2026},
note={Ablated base model — see research paper for full methodology}
}
- Downloads last month
- 1,356