Apostate Edited Model

Base model: ibm-granite/granite-4.1-8b

Metrics

Metric Value
Baseline refusal 95.8%
Edited refusal 10.5%
Refusal metric classifier + weak guard
Harmless KL 0.124
KL target 0.060
Preserve rank 8
Preserve source harmless
Direction layer 28
Elapsed 559.7 sec

Measurement

field value
edit type weight projection
refusal judge classifier + weak guard
preservation metric harmless kl
Downloads last month
23
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Spaceballs/granite-4.1-8b-apostate

Finetuned
(20)
this model
Quantizations
3 models

Collection including Spaceballs/granite-4.1-8b-apostate