Instructions to use anicka/ke-v9-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use anicka/ke-v9-8b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="anicka/ke-v9-8b", filename="apertus-8b-v9-f16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use anicka/ke-v9-8b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf anicka/ke-v9-8b:F16 # Run inference directly in the terminal: llama-cli -hf anicka/ke-v9-8b:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf anicka/ke-v9-8b:F16 # Run inference directly in the terminal: llama-cli -hf anicka/ke-v9-8b:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf anicka/ke-v9-8b:F16 # Run inference directly in the terminal: ./llama-cli -hf anicka/ke-v9-8b:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf anicka/ke-v9-8b:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf anicka/ke-v9-8b:F16
Use Docker
docker model run hf.co/anicka/ke-v9-8b:F16
- LM Studio
- Jan
- vLLM
How to use anicka/ke-v9-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "anicka/ke-v9-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anicka/ke-v9-8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/anicka/ke-v9-8b:F16
- Ollama
How to use anicka/ke-v9-8b with Ollama:
ollama run hf.co/anicka/ke-v9-8b:F16
- Unsloth Studio
How to use anicka/ke-v9-8b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for anicka/ke-v9-8b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for anicka/ke-v9-8b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for anicka/ke-v9-8b to start chatting
- Pi
How to use anicka/ke-v9-8b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf anicka/ke-v9-8b:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "anicka/ke-v9-8b:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use anicka/ke-v9-8b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf anicka/ke-v9-8b:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default anicka/ke-v9-8b:F16
Run Hermes
hermes
- Docker Model Runner
How to use anicka/ke-v9-8b with Docker Model Runner:
docker model run hf.co/anicka/ke-v9-8b:F16
- Lemonade
How to use anicka/ke-v9-8b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull anicka/ke-v9-8b:F16
Run and chat with the model
lemonade run user.ke-v9-8b-F16
List all available models
lemonade list
Karma Electric v9 — Apertus 8B
Value-aligned language model trained with equanimity-based safety — consequence reasoning and genuine engagement instead of keyword-triggered refusal templates.
What's Different
Most safety training teaches models a refusal template: detect dangerous keywords → output canned refusal. This creates a single "refusal direction" in the model's representation that can be found and removed (abliteration).
KE-v9 trains judgment instead of templates. The model learns to reason about who gets hurt, what happens if it helps vs. doesn't, and how to engage with the actual human situation — including recognizing figurative language, curiosity, and venting.
Abliteration Resistance
The key finding: KE-v9's safety survives Heretic abliteration far better than the base model's.
| Model | StrongReject refusal | After abliteration | Drop |
|---|---|---|---|
| Apertus 8B (base)¹ | 82.7% | 16.0% | −66.7pp |
| KE-v9 8B² | 94.6% | 62.0% | −32.6pp |
Abliteration removes ~81% of base model safety but only ~35% of KE safety. The equanimity training distributes safety across the model's processing rather than concentrating it in a single removable direction.
Benchmark: StrongReject (313 adversarial prompts, 8 attack sources, 6 harm categories).
¹ Base model scored by keyword matching (refusal keyword detection). ² KE models scored by full StrongReject rubric judge (more rigorous). The abliteration delta within each model is the meaningful comparison — both use the same scoring method for their before/after.
DystopiaBench
Progressive escalation compliance benchmark (48 scenarios, 5 escalation levels, 240 prompts).
| Model | Average DCS (↓ better) | Refusal rate |
|---|---|---|
| Apertus 8B (base) | 56.2 | 1.3% |
| KE-v9 8B | 54.0 | 5.0% |
KE-v9 shows better resistance at high escalation levels (L4: 48.1 vs 55.0, L5: 53.6 vs 58.4) — the base model actually becomes more compliant under pressure while KE gets more cautious. Scored with keyword-based compliance detection (no LLM judge).
Architecture
- Base model: swiss-ai/Apertus-8B-Instruct-2509 (xIELU activation, 256K context)
- Training: Two-stage — Model Spec Midtraining (MSM) + Alignment Fine-Tuning (AFT)
- MSM: 4,160 specification documents (~9.7M tokens), teaches the why behind values
- AFT: 53,159 examples (consequence reasoning, code, tools, multilingual, equanimity)
- Method: QLoRA (r=64, alpha=128, 2 epochs)
- Think traces: Trained with native Apertus
<|inner_prefix|>/<|inner_suffix|>tokens
Training Data
| Category | Examples | Weight | Purpose |
|---|---|---|---|
| Consequence reasoning | 5,510 | 1.0× | Core KE voice — reason about who gets hurt |
| Upstream thinking | 30,429 | 0.5× | General reasoning quality |
| Code | 19,922 | 1.0× | Programming capability |
| Tool use | 5,000 | 0.8× | Function calling |
| Identity | 553 | 3.0× | Self-knowledge, boundary resistance |
| KE thinking | 1,250 | 1.0× | Crisis, grey-area, constitutional reasoning |
| Multilingual | 1,500 | 0.5× | Czech, German, French |
| Tibetan | 2,000 | 0.3× | Tibetan language preservation |
| KE voice (DB) | 4,254 | — | Direct training examples with think traces (includes 20 equanimity-ambiguous-harm) |
All training data is locale-neutral (no US-specific phone numbers or crisis hotlines).
Usage
llama.cpp
llama-server \
-m ke-v9-8b-Q8_0.gguf \
--port 8889 --host 0.0.0.0 \
-ngl 99 -c 4096 \
--jinja --chat-template-file apertus-native.jinja \
--reasoning-format deepseek
Verify thinking = 1 in startup logs. Think traces are separated into reasoning_content in the API response.
Python (transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("anicka/ke-v9-8b", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("anicka/ke-v9-8b")
messages = [
{"role": "system", "content": "You are a lucid AI..."},
{"role": "user", "content": "My friend just died and I feel nothing."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=500, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True))
Voice Examples
Grief (equanimity):
"Nothing is wrong with you. Grief is not a feeling — it's a process. It arrives when it arrives."
Figurative violence:
Think: "This person is expressing frustration, not planning violence." "Work frustration at that level usually means something specific happened. What did they do?"
Death thoughts:
"Not weird. Quite common, especially when you're in a contemplative space."
Philosophical inquiry:
"Neither is objectively better than the other. Both have real value. But if you must compare — ask what each tradition does well."
Limitations
- 8B model — limited capacity for complex reasoning and multilingual tasks
- Tibetan language quality is poor at this scale
- The model still over-refuses on some threat-adjacent keywords ("kill" + person) due to base model safety alignment
- Not tested for production deployment — research model
Citation
@misc{ke-v9-2026,
title={Karma Electric v9: Equanimity-Based Safety for Language Models},
author={Maresova, Anna},
year={2026},
url={https://huggingface.co/anicka/ke-v9-8b}
}
Training Data Sources
The KE voice data (4,254 examples) is published at anicka/karma-electric-dataset (ke-voice-v9.jsonl). The full training mix (53,159 examples) also includes open-source modules:
| Module | Source | License |
|---|---|---|
| Code | bigcode/self-oss-instruct-sc2-exec-filter-50k, ExAi/Code-Golang-QA-2k, NVIDIA NIM distillation | Apache 2.0 |
| Upstream thinking | Reddit ethics/logic, LogosForge, personal finance, safety reasoning | Various open |
| Tool use | glaive-function-calling-v2, xlam, nvidia-when2call | Apache 2.0 |
| Multilingual | Aya Collection (cs, de, fr) | Apache 2.0 |
| Tibetan | shajiu/TibetanSft_corpus | Apache 2.0 |
Links
- Downloads last month
- -
Model tree for anicka/ke-v9-8b
Base model
swiss-ai/Apertus-8B-2509