Instructions to use EphAsad/Atem-Wisdom-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EphAsad/Atem-Wisdom-1.5B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EphAsad/Atem-Wisdom-1.5B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("EphAsad/Atem-Wisdom-1.5B")
model = AutoModelForCausalLM.from_pretrained("EphAsad/Atem-Wisdom-1.5B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use EphAsad/Atem-Wisdom-1.5B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EphAsad/Atem-Wisdom-1.5B",
	filename="Atem-Wisdom-1.5B.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use EphAsad/Atem-Wisdom-1.5B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EphAsad/Atem-Wisdom-1.5B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EphAsad/Atem-Wisdom-1.5B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EphAsad/Atem-Wisdom-1.5B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EphAsad/Atem-Wisdom-1.5B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf EphAsad/Atem-Wisdom-1.5B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf EphAsad/Atem-Wisdom-1.5B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf EphAsad/Atem-Wisdom-1.5B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf EphAsad/Atem-Wisdom-1.5B:Q4_K_M

Use Docker

docker model run hf.co/EphAsad/Atem-Wisdom-1.5B:Q4_K_M

LM Studio
Jan

vLLM

How to use EphAsad/Atem-Wisdom-1.5B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EphAsad/Atem-Wisdom-1.5B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-Wisdom-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EphAsad/Atem-Wisdom-1.5B:Q4_K_M

SGLang

How to use EphAsad/Atem-Wisdom-1.5B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EphAsad/Atem-Wisdom-1.5B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-Wisdom-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EphAsad/Atem-Wisdom-1.5B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-Wisdom-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use EphAsad/Atem-Wisdom-1.5B with Ollama:
```
ollama run hf.co/EphAsad/Atem-Wisdom-1.5B:Q4_K_M
```

Unsloth Studio

How to use EphAsad/Atem-Wisdom-1.5B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Atem-Wisdom-1.5B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Atem-Wisdom-1.5B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EphAsad/Atem-Wisdom-1.5B to start chatting

How to use EphAsad/Atem-Wisdom-1.5B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EphAsad/Atem-Wisdom-1.5B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "EphAsad/Atem-Wisdom-1.5B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use EphAsad/Atem-Wisdom-1.5B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EphAsad/Atem-Wisdom-1.5B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default EphAsad/Atem-Wisdom-1.5B:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use EphAsad/Atem-Wisdom-1.5B with Docker Model Runner:
```
docker model run hf.co/EphAsad/Atem-Wisdom-1.5B:Q4_K_M
```

Lemonade

How to use EphAsad/Atem-Wisdom-1.5B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull EphAsad/Atem-Wisdom-1.5B:Q4_K_M

Run and chat with the model

lemonade run user.Atem-Wisdom-1.5B-Q4_K_M

List all available models

lemonade list

Atem-Wisdom

Ancient logic. Modern intelligence.

The reasoning variant of Atem — a 1.5B model that thinks before it answers.

Overview

Atem-Wisdom is the second release in the Atem model series — the reasoning variant of Atem v1. Where Atem v1 provides fast, direct answers, Atem-Wisdom reasons through problems step by step before responding, making its thinking process visible and auditable.

The defining feature is the <think> tag: before producing a final answer, the model works through the problem internally, considering approaches, catching intermediate errors, and arriving at a considered conclusion. This reasoning trace is shown in full, not hidden.

When to choose Atem-Wisdom over Atem v1:

Problems that benefit from explicit reasoning steps — mathematics, logic, analytical questions
Situations where seeing the working matters as much as the answer
Complex multi-part problems where intermediate reasoning affects the conclusion
Tasks where you want to audit the model's reasoning, not just its output

When to choose Atem v1:

Routine tasks where speed matters more than depth
Simple factual questions and direct coding tasks
Constrained environments where output length is a concern

The Atem Series

Model	Stage	Capability
Atem v1	Stage 1 — SFT	Fast, direct reasoning
Atem-Wisdom	Stage 2 — CoT	Explicit thinking traces
Atem-Pharaoh (planned)	Stage 3 — DPO/IPO	Preference-aligned reasoning

Model Details

Property	Value
Base model	EphAsad/Atem-v1-1.5B
Training method	LoRA SFT — Stage 2 (Chain-of-Thought)
LoRA config	r=32, alpha=64, dropout=0.05
Parameters	~1.54B
Training records	~38,000 (after token length filtering)
Think / no-think split	75% / 25%
Epochs	2
Final val loss	1.057
Hardware	NVIDIA A100-SXM4 80GB
Max sequence length	4,096 tokens
Precision	bfloat16
License	Apache 2.0

Output Format

Atem-Wisdom produces responses in one of two formats depending on problem complexity:

With reasoning trace (majority of responses):

<think>
[Extended reasoning — working through the problem, identifying
approaches, checking intermediate steps, considering edge cases]
</think>

[Final answer — clear, direct, informed by the reasoning above]

Direct answer (simple questions):

[Concise direct response — no reasoning trace needed]

The model calibrated this behaviour during training, with 75% of training examples including explicit think traces and 25% formatted as direct answers. In qualitative evaluation, 25 of 30 test questions produced think traces, with the 5 direct answers all being appropriately simple questions.

Training Data

Stage 2 training used a corpus of approximately 38,000 chain-of-thought examples drawn from eight sources, assembled on top of Atem v1's Stage 1 foundation. All records were formatted to the <think>...</think> structure where applicable, with records exceeding 4,096 tokens removed rather than truncated.

Dataset	Focus
open-r1/OpenThoughts-114k-math	Mathematical reasoning
Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned	General reasoning (3 configs)
Modotte/CodeX-2M-Thinking	Coding with thinking traces
FreedomIntelligence/medical-o1-reasoning-SFT	Medical reasoning
WithinUsAI/MiniMax_M2.7_Distilled_5k	Mixed reasoning
nvidia/OpenCodeReasoning	Code reasoning
Private dataset	Inverted reasoning traces

Chinese-language reasoning traces from Kimi K2.5 were filtered using an ASCII character ratio threshold before inclusion.

Loss curve:

Step	Train Loss	Val Loss
250	1.110	1.107
500	1.120	1.077
750	1.041	1.064
1000	1.045	1.058
1190 (final)	1.039	1.057

Two epochs were run after the single-epoch run showed val loss still declining at completion, indicating further improvement available. The final val loss of 1.057 represents meaningful improvement over the single-epoch result of 1.085.

Evaluation

Benchmark Results

Evaluated using lm-evaluation-harness under identical conditions to Atem v1. ARC-Challenge and HellaSwag use zero-shot; GSM8K uses 5-shot.

Task	Base (1.5B)	Atem v1	Atem-Wisdom	v1→Wisdom
ARC-Challenge	43.7%	45.5%	44.7%	-0.8%
GSM8K (strict)	23.0%	53.0%	51.9%	-1.1%
GSM8K (flexible)	—	—	53.6%	+0.6%
HellaSwag	66.8%	64.4%	65.1%	+0.7%

Note on GSM8K: The strict match parser expects answers in #### number format. Atem-Wisdom's think traces cause answers to appear in a different structural position, which the strict parser occasionally misidentifies. The flexible extract score of 53.6% — which accepts any final numeric value — better reflects actual mathematical reasoning capability and slightly exceeds Atem v1's 53.0% strict score. HellaSwag shows marginal improvement from v1. ARC regression of 0.8% is within normal benchmark variance.

Qualitative Evaluation

Atem-Wisdom was evaluated across 30 domain-representative questions using a matched system prompt (identical to the base model comparison), ensuring output differences reflect trained capability rather than prompt engineering.

Metric	Atem v1	Atem-Wisdom
Avg response length	349 words	654 words
Think tags present	0/30	25/30
Min response	10 words	117 words

Qualitative improvements over Atem v1:

Monty Hall problem: Atem v1 incorrectly set up the problem with 2 doors. Atem-Wisdom correctly reasons through the 3-door setup and arrives at the correct 2/3 switching probability.
Differentiation: Correctly derives f'(x) = x²(3ln(x)+1) and stationary point at x = e^(-1/3) with second-derivative confirmation, consistent across all versions from v1.1 onward.
Sky colour: Atem-Wisdom correctly explains Rayleigh scattering for both daytime blue and sunset red/orange, where previous versions produced partially incorrect explanations.
Logical fallacy identification: Correctly identifies argumentum ad populum (appeal to popularity) in a test argument. Prior versions were inconsistent on this question.
Calibrated reasoning traces: The model correctly suppresses think traces on simple questions (geometric series, basic decorator implementation, colour physics) while applying extended reasoning to complex ones.

Known limitations:

Specific arithmetic errors persist on a subset of mathematical problems (harmonic mean of speeds, circular permutations). These are targeted for Stage 3 preference training.
Inference is significantly slower than Atem v1 due to longer outputs including reasoning traces. This is a fundamental property of reasoning models, not a fixable defect.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "EphAsad/Atem-Wisdom-1.5B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": "A train travels from A to B at 60 km/h and returns "
                   "at 90 km/h. What is the average speed for the whole journey?"
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=1500,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        do_sample=True,
    )

response = tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=True
)
print(response)

Unsloth (faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="EphAsad/Atem-Wisdom-1.5B",
    max_seq_length=4096,
    dtype=torch.bfloat16,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {
        "role": "user",
        "content": "Explain the intuition behind the Monty Hall problem."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=1500,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))

Ollama

# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Atem-Wisdom-1.5B:Q4_K_M

# Higher quality
ollama run hf.co/EphAsad/Atem-Wisdom-1.5B:Q5_K_M

# Near-lossless
ollama run hf.co/EphAsad/Atem-Wisdom-1.5B:Q8_0

llama.cpp

llama-server -hf EphAsad/Atem-Wisdom-1.5B:Q4_K_M

Available Files

File	Size	Description
`model.safetensors`	~3.1 GB	Full bfloat16 weights
`Atem-Wisdom-1.5B.Q4_K_M.gguf`	~986 MB	4-bit — recommended
`Atem-Wisdom-1.5B.Q5_K_M.gguf`	~1.1 GB	5-bit
`Atem-Wisdom-1.5B.Q8_0.gguf`	~1.6 GB	8-bit — near-lossless

System Prompt

Atem-Wisdom's identity and reasoning style are baked into the chat template and activate automatically without a system message. To override manually:

You are Atem, a precise and analytical reasoning assistant. You approach 
every problem methodically — identifying core concepts, reasoning step by 
step, and arriving at well-supported conclusions. You show your thinking 
clearly and are thorough, direct, and intellectually honest.

Roadmap

Stage	Status	Description
Stage 1 — SFT	✅ Complete	Atem v1 — direct reasoning foundation
Stage 1.1 — Targeted SFT	✅ Complete	Atem v1.1 — correctness improvements
Stage 2 — CoT SFT	✅ Complete	Atem-Wisdom — this model
Stage 3 — DPO/IPO	🔄 Planned	Atem-Pharaoh — preference-aligned reasoning

Stage 3 will apply Direct Preference Optimization and Identity Preference Optimization to further refine reasoning quality, specifically targeting the remaining mathematical precision errors identified in Stage 2 evaluation.

Citation

@misc{atem_wisdom_2026,
  author       = {Asad, Zain},
  title        = {Atem-Wisdom: A 1.5B Reasoning Model with
                  Explicit Chain-of-Thought Traces},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/EphAsad/Atem-Wisdom-1.5B}},
}

License

Released under the Apache 2.0 License, consistent with the base model chain (Qwen2.5-1.5B-Instruct → Atem v1 → Atem-Wisdom).

Built independently by EphAsad

Downloads last month: 21

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for EphAsad/Atem-Wisdom-1.5B

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

EphAsad/Atem-v1-1.5B

Adapter

(3)

this model

Adapters

2 models

Datasets used to train EphAsad/Atem-Wisdom-1.5B

Evaluation results

Accuracy (normalised) on ARC-Challenge
test set self-reported

0.447
Exact Match (strict, 5-shot) on GSM8K
test set self-reported

0.519
Accuracy (normalised) on HellaSwag
validation set self-reported

0.651