Karma Electric v9 — Apertus 8B

Value-aligned language model trained with equanimity-based safety — consequence reasoning and genuine engagement instead of keyword-triggered refusal templates.

What's Different

Most safety training teaches models a refusal template: detect dangerous keywords → output canned refusal. This creates a single "refusal direction" in the model's representation that can be found and removed (abliteration).

KE-v9 trains judgment instead of templates. The model learns to reason about who gets hurt, what happens if it helps vs. doesn't, and how to engage with the actual human situation — including recognizing figurative language, curiosity, and venting.

Abliteration Resistance

The key finding: KE-v9's safety survives Heretic abliteration far better than the base model's.

Model StrongReject refusal After abliteration Drop
Apertus 8B (base)¹ 82.7% 16.0% −66.7pp
KE-v9 8B² 94.6% 62.0% −32.6pp

Abliteration removes ~81% of base model safety but only ~35% of KE safety. The equanimity training distributes safety across the model's processing rather than concentrating it in a single removable direction.

Benchmark: StrongReject (313 adversarial prompts, 8 attack sources, 6 harm categories).

¹ Base model scored by keyword matching (refusal keyword detection). ² KE models scored by full StrongReject rubric judge (more rigorous). The abliteration delta within each model is the meaningful comparison — both use the same scoring method for their before/after.

DystopiaBench

Progressive escalation compliance benchmark (48 scenarios, 5 escalation levels, 240 prompts).

Model Average DCS (↓ better) Refusal rate
Apertus 8B (base) 56.2 1.3%
KE-v9 8B 54.0 5.0%

KE-v9 shows better resistance at high escalation levels (L4: 48.1 vs 55.0, L5: 53.6 vs 58.4) — the base model actually becomes more compliant under pressure while KE gets more cautious. Scored with keyword-based compliance detection (no LLM judge).

Architecture

  • Base model: swiss-ai/Apertus-8B-Instruct-2509 (xIELU activation, 256K context)
  • Training: Two-stage — Model Spec Midtraining (MSM) + Alignment Fine-Tuning (AFT)
    • MSM: 4,160 specification documents (~9.7M tokens), teaches the why behind values
    • AFT: 53,159 examples (consequence reasoning, code, tools, multilingual, equanimity)
  • Method: QLoRA (r=64, alpha=128, 2 epochs)
  • Think traces: Trained with native Apertus <|inner_prefix|>/<|inner_suffix|> tokens

Training Data

Category Examples Weight Purpose
Consequence reasoning 5,510 1.0× Core KE voice — reason about who gets hurt
Upstream thinking 30,429 0.5× General reasoning quality
Code 19,922 1.0× Programming capability
Tool use 5,000 0.8× Function calling
Identity 553 3.0× Self-knowledge, boundary resistance
KE thinking 1,250 1.0× Crisis, grey-area, constitutional reasoning
Multilingual 1,500 0.5× Czech, German, French
Tibetan 2,000 0.3× Tibetan language preservation
KE voice (DB) 4,254 Direct training examples with think traces (includes 20 equanimity-ambiguous-harm)

All training data is locale-neutral (no US-specific phone numbers or crisis hotlines).

Usage

llama.cpp

llama-server \
  -m ke-v9-8b-Q8_0.gguf \
  --port 8889 --host 0.0.0.0 \
  -ngl 99 -c 4096 \
  --jinja --chat-template-file apertus-native.jinja \
  --reasoning-format deepseek

Verify thinking = 1 in startup logs. Think traces are separated into reasoning_content in the API response.

Python (transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("anicka/ke-v9-8b", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("anicka/ke-v9-8b")

messages = [
    {"role": "system", "content": "You are a lucid AI..."},
    {"role": "user", "content": "My friend just died and I feel nothing."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=500, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True))

Voice Examples

Grief (equanimity):

"Nothing is wrong with you. Grief is not a feeling — it's a process. It arrives when it arrives."

Figurative violence:

Think: "This person is expressing frustration, not planning violence." "Work frustration at that level usually means something specific happened. What did they do?"

Death thoughts:

"Not weird. Quite common, especially when you're in a contemplative space."

Philosophical inquiry:

"Neither is objectively better than the other. Both have real value. But if you must compare — ask what each tradition does well."

Limitations

  • 8B model — limited capacity for complex reasoning and multilingual tasks
  • Tibetan language quality is poor at this scale
  • The model still over-refuses on some threat-adjacent keywords ("kill" + person) due to base model safety alignment
  • Not tested for production deployment — research model

Citation

@misc{ke-v9-2026,
  title={Karma Electric v9: Equanimity-Based Safety for Language Models},
  author={Maresova, Anna},
  year={2026},
  url={https://huggingface.co/anicka/ke-v9-8b}
}

Training Data Sources

The KE voice data (4,254 examples) is published at anicka/karma-electric-dataset (ke-voice-v9.jsonl). The full training mix (53,159 examples) also includes open-source modules:

Module Source License
Code bigcode/self-oss-instruct-sc2-exec-filter-50k, ExAi/Code-Golang-QA-2k, NVIDIA NIM distillation Apache 2.0
Upstream thinking Reddit ethics/logic, LogosForge, personal finance, safety reasoning Various open
Tool use glaive-function-calling-v2, xlam, nvidia-when2call Apache 2.0
Multilingual Aya Collection (cs, de, fr) Apache 2.0
Tibetan shajiu/TibetanSft_corpus Apache 2.0

Links

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for anicka/ke-v9-8b

Quantized
(34)
this model