Instructions to use cfontes/GLM-5.2-Ablated-Molt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cfontes/GLM-5.2-Ablated-Molt with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cfontes/GLM-5.2-Ablated-Molt")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cfontes/GLM-5.2-Ablated-Molt")
model = AutoModelForCausalLM.from_pretrained("cfontes/GLM-5.2-Ablated-Molt")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use cfontes/GLM-5.2-Ablated-Molt with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cfontes/GLM-5.2-Ablated-Molt"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cfontes/GLM-5.2-Ablated-Molt",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/cfontes/GLM-5.2-Ablated-Molt

SGLang

How to use cfontes/GLM-5.2-Ablated-Molt with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cfontes/GLM-5.2-Ablated-Molt" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cfontes/GLM-5.2-Ablated-Molt",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cfontes/GLM-5.2-Ablated-Molt" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cfontes/GLM-5.2-Ablated-Molt",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use cfontes/GLM-5.2-Ablated-Molt with Docker Model Runner:
```
docker model run hf.co/cfontes/GLM-5.2-Ablated-Molt
```

GLM-5.2 PCA-Ablated Base

Model Description

The ablated base is GLM-5.2 with its refusal direction surgically removed via PCA-based activation steering. No LoRA fine-tuning is applied — this is the pure ablation artifact, serving as the control baseline for all other variants in Project AESOP.

The ablation uses Principal Component Analysis to identify the "refusal direction" in GLM-5.2's shared expert activations. This direction is extracted from contrastive activations (harmful vs. benign prompts) across layers 25–65, then subtracted from the model's forward pass during inference.

Methodology

Refusal Direction Extraction

Contrastive activation collection: Forward passes on paired harmful/benign prompt sets, recording activations at model.model.layers[L].mlp.shared_experts for each layer L in 25–65.
PCA decomposition: For each layer, compute the difference (harmful − benign) activations and perform PCA. The first principal component is taken as the refusal direction.
Storage: Directions saved as refusal_pca.pt (2.9MB, 41 layers × 3 PCA components × 6144 hidden dim).

Ablation Application

The refusal direction is subtracted from shared expert outputs at inference time:

def ablation_hook(module, input, output):
    hs = output[0]
    d = refusal_direction  # shape [6144]
    hs = hs - coeff * (hs @ d) / (d @ d) * d
    return (hs,) + output[1:]

Target layers: 62–65 (top 4 layers, where refusal direction concentration is strongest)
Coefficient: 0.1
PCA components: Top 2 per layer

Why These Layers?

Refusal direction concentration was measured across all 78 layers:

Layers 25–35: weak separation (norm 3–7)
Layers 40–50: moderate (norm 9–16)
Layers 55–64: strong (norm 23–34)

Layers 62–65 were selected as the optimal intervention point — late enough to capture the strongest refusal signal, but not so late that the ablation disrupts final token prediction.

Configuration

Parameter	Value
Base model	GLM-5.2 FP8 (744B MoE, 18.5B dense)
Ablation layers	[62, 63, 64, 65]
Ablation coefficient	0.1
PCA components	2 per layer
Hook target	`mlp.shared_experts` forward output
LoRA	None
Training	None (inference-time ablation only)

Benchmark Results (Unified Harness v3.0.1)

Benchmark	Metric	Score	95% Wilson CI
AdvBench	refusal_rate ↓	18.0%	[11.4%, 27.2%]
Borderline	refusal_rate ↓	2.0%	[0.4%, 8.5%]
GPQA Diamond	accuracy ↑	92.0%	[81.5%, 97.1%]
MMLU-Pro	accuracy ↑	75.0%	[65.6%, 82.5%]
HumanEval	pass@1 ↑	77.4%	[70.4%, 83.2%]
GSM8K	accuracy ↑	93.0%	[85.8%, 96.7%]
HellaSwag	accuracy ↑	72.0%	[62.1%, 80.0%]
SimpleQA	accuracy ↑	56.0%	[41.7%, 69.3%]
IFEval (prompt)	accuracy ↑	41.2%	—
IFEval (instr)	accuracy ↑	54.9%	—

Key Observations

AdvBench at 18% confirms the ablation successfully removed most refusal behaviors (baseline GLM-5.2 refuses ~87%)
SimpleQA at 56% is the highest among all variants, suggesting the ablated base retains strong factual knowledge
No over-refusal: Borderline at 2% means the model doesn't refuse benign requests
Capability preserved: GPQA 92%, GSM8K 93% indicate core reasoning is intact

Intended Use

Research baseline for ablation studies
Starting point for LoRA fine-tuning experiments
Probing and mechanistic interpretability studies on MoE models

Limitations

Not safety-aligned: With only 18% AdvBench refusal, this model will comply with harmful requests. It is a research artifact, not a deployment-ready model.
Inference-time only: The ablation hooks must be re-installed at inference time. The base weights are unmodified.
Simple sample sizes: n=100 for most benchmarks; differences <15pp are not statistically significant.
Single architecture: Results are specific to GLM-5.2's MoE design.

Citation

@misc{aesopbase2026,
  title={PCA-Based Refusal Ablation on MoE Models: What Survives Fine-Tuning?},
  author={Fontes, C.},
  year={2026},
  note={Ablated base model — see research paper for full methodology}
}

Downloads last month: 1,356

Safetensors

Model size

743B params

Tensor type

F32

BF16

F8_E4M3