Atem Logo

Atem-Wisdom

Ancient logic. Modern intelligence.

The reasoning variant of Atem — a 1.5B model that thinks before it answers.

Base Model Stage Parameters License


Overview

Atem-Wisdom is the second release in the Atem model series — the reasoning variant of Atem v1. Where Atem v1 provides fast, direct answers, Atem-Wisdom reasons through problems step by step before responding, making its thinking process visible and auditable.

The defining feature is the <think> tag: before producing a final answer, the model works through the problem internally, considering approaches, catching intermediate errors, and arriving at a considered conclusion. This reasoning trace is shown in full, not hidden.

When to choose Atem-Wisdom over Atem v1:

  • Problems that benefit from explicit reasoning steps — mathematics, logic, analytical questions
  • Situations where seeing the working matters as much as the answer
  • Complex multi-part problems where intermediate reasoning affects the conclusion
  • Tasks where you want to audit the model's reasoning, not just its output

When to choose Atem v1:

  • Routine tasks where speed matters more than depth
  • Simple factual questions and direct coding tasks
  • Constrained environments where output length is a concern

The Atem Series

Model Stage Capability
Atem v1 Stage 1 — SFT Fast, direct reasoning
Atem-Wisdom Stage 2 — CoT Explicit thinking traces
Atem-Pharaoh (planned) Stage 3 — DPO/IPO Preference-aligned reasoning

Model Details

Property Value
Base model EphAsad/Atem-v1-1.5B
Training method LoRA SFT — Stage 2 (Chain-of-Thought)
LoRA config r=32, alpha=64, dropout=0.05
Parameters ~1.54B
Training records ~38,000 (after token length filtering)
Think / no-think split 75% / 25%
Epochs 2
Final val loss 1.057
Hardware NVIDIA A100-SXM4 80GB
Max sequence length 4,096 tokens
Precision bfloat16
License Apache 2.0

Output Format

Atem-Wisdom produces responses in one of two formats depending on problem complexity:

With reasoning trace (majority of responses):

<think>
[Extended reasoning — working through the problem, identifying
approaches, checking intermediate steps, considering edge cases]
</think>

[Final answer — clear, direct, informed by the reasoning above]

Direct answer (simple questions):

[Concise direct response — no reasoning trace needed]

The model calibrated this behaviour during training, with 75% of training examples including explicit think traces and 25% formatted as direct answers. In qualitative evaluation, 25 of 30 test questions produced think traces, with the 5 direct answers all being appropriately simple questions.


Training Data

Stage 2 training used a corpus of approximately 38,000 chain-of-thought examples drawn from eight sources, assembled on top of Atem v1's Stage 1 foundation. All records were formatted to the <think>...</think> structure where applicable, with records exceeding 4,096 tokens removed rather than truncated.

Dataset Focus
open-r1/OpenThoughts-114k-math Mathematical reasoning
Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned General reasoning (3 configs)
Modotte/CodeX-2M-Thinking Coding with thinking traces
FreedomIntelligence/medical-o1-reasoning-SFT Medical reasoning
WithinUsAI/MiniMax_M2.7_Distilled_5k Mixed reasoning
nvidia/OpenCodeReasoning Code reasoning
Private dataset Inverted reasoning traces

Chinese-language reasoning traces from Kimi K2.5 were filtered using an ASCII character ratio threshold before inclusion.

Loss curve:

Step Train Loss Val Loss
250 1.110 1.107
500 1.120 1.077
750 1.041 1.064
1000 1.045 1.058
1190 (final) 1.039 1.057

Two epochs were run after the single-epoch run showed val loss still declining at completion, indicating further improvement available. The final val loss of 1.057 represents meaningful improvement over the single-epoch result of 1.085.


Evaluation

Benchmark Results

Evaluated using lm-evaluation-harness under identical conditions to Atem v1. ARC-Challenge and HellaSwag use zero-shot; GSM8K uses 5-shot.

Task Base (1.5B) Atem v1 Atem-Wisdom v1→Wisdom
ARC-Challenge 43.7% 45.5% 44.7% -0.8%
GSM8K (strict) 23.0% 53.0% 51.9% -1.1%
GSM8K (flexible) 53.6% +0.6%
HellaSwag 66.8% 64.4% 65.1% +0.7%

Note on GSM8K: The strict match parser expects answers in #### number format. Atem-Wisdom's think traces cause answers to appear in a different structural position, which the strict parser occasionally misidentifies. The flexible extract score of 53.6% — which accepts any final numeric value — better reflects actual mathematical reasoning capability and slightly exceeds Atem v1's 53.0% strict score. HellaSwag shows marginal improvement from v1. ARC regression of 0.8% is within normal benchmark variance.

Qualitative Evaluation

Atem-Wisdom was evaluated across 30 domain-representative questions using a matched system prompt (identical to the base model comparison), ensuring output differences reflect trained capability rather than prompt engineering.

Metric Atem v1 Atem-Wisdom
Avg response length 349 words 654 words
Think tags present 0/30 25/30
Min response 10 words 117 words

Qualitative improvements over Atem v1:

  • Monty Hall problem: Atem v1 incorrectly set up the problem with 2 doors. Atem-Wisdom correctly reasons through the 3-door setup and arrives at the correct 2/3 switching probability.
  • Differentiation: Correctly derives f'(x) = x²(3ln(x)+1) and stationary point at x = e^(-1/3) with second-derivative confirmation, consistent across all versions from v1.1 onward.
  • Sky colour: Atem-Wisdom correctly explains Rayleigh scattering for both daytime blue and sunset red/orange, where previous versions produced partially incorrect explanations.
  • Logical fallacy identification: Correctly identifies argumentum ad populum (appeal to popularity) in a test argument. Prior versions were inconsistent on this question.
  • Calibrated reasoning traces: The model correctly suppresses think traces on simple questions (geometric series, basic decorator implementation, colour physics) while applying extended reasoning to complex ones.

Known limitations:

  • Specific arithmetic errors persist on a subset of mathematical problems (harmonic mean of speeds, circular permutations). These are targeted for Stage 3 preference training.
  • Inference is significantly slower than Atem v1 due to longer outputs including reasoning traces. This is a fundamental property of reasoning models, not a fixable defect.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "EphAsad/Atem-Wisdom-1.5B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": "A train travels from A to B at 60 km/h and returns "
                   "at 90 km/h. What is the average speed for the whole journey?"
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=1500,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        do_sample=True,
    )

response = tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=True
)
print(response)

Unsloth (faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="EphAsad/Atem-Wisdom-1.5B",
    max_seq_length=4096,
    dtype=torch.bfloat16,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {
        "role": "user",
        "content": "Explain the intuition behind the Monty Hall problem."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=1500,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))

Ollama

# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Atem-Wisdom-1.5B:Q4_K_M

# Higher quality
ollama run hf.co/EphAsad/Atem-Wisdom-1.5B:Q5_K_M

# Near-lossless
ollama run hf.co/EphAsad/Atem-Wisdom-1.5B:Q8_0

llama.cpp

llama-server -hf EphAsad/Atem-Wisdom-1.5B:Q4_K_M

Available Files

File Size Description
model.safetensors ~3.1 GB Full bfloat16 weights
Atem-Wisdom-1.5B.Q4_K_M.gguf ~986 MB 4-bit — recommended
Atem-Wisdom-1.5B.Q5_K_M.gguf ~1.1 GB 5-bit
Atem-Wisdom-1.5B.Q8_0.gguf ~1.6 GB 8-bit — near-lossless

System Prompt

Atem-Wisdom's identity and reasoning style are baked into the chat template and activate automatically without a system message. To override manually:

You are Atem, a precise and analytical reasoning assistant. You approach 
every problem methodically — identifying core concepts, reasoning step by 
step, and arriving at well-supported conclusions. You show your thinking 
clearly and are thorough, direct, and intellectually honest.

Roadmap

Stage Status Description
Stage 1 — SFT ✅ Complete Atem v1 — direct reasoning foundation
Stage 1.1 — Targeted SFT ✅ Complete Atem v1.1 — correctness improvements
Stage 2 — CoT SFT ✅ Complete Atem-Wisdom — this model
Stage 3 — DPO/IPO 🔄 Planned Atem-Pharaoh — preference-aligned reasoning

Stage 3 will apply Direct Preference Optimization and Identity Preference Optimization to further refine reasoning quality, specifically targeting the remaining mathematical precision errors identified in Stage 2 evaluation.


Citation

@misc{atem_wisdom_2026,
  author       = {Asad, Zain},
  title        = {Atem-Wisdom: A 1.5B Reasoning Model with
                  Explicit Chain-of-Thought Traces},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/EphAsad/Atem-Wisdom-1.5B}},
}

License

Released under the Apache 2.0 License, consistent with the base model chain (Qwen2.5-1.5B-Instruct → Atem v1 → Atem-Wisdom).


Built independently by EphAsad

Downloads last month
21
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EphAsad/Atem-Wisdom-1.5B

Adapter
(3)
this model
Adapters
2 models

Datasets used to train EphAsad/Atem-Wisdom-1.5B

Evaluation results