Atem Logo

Atem-SageCoder

Ancient logic. Modern intelligence. Applied to code.

A 1.5B code reasoning model that thinks before it writes — trained on verified competitive programming traces from frontier models.

Base Model Stage Parameters License


Overview

Atem-SageCoder is a code-specialised variant of Atem-Wisdom-1.5B, fine-tuned on verified chain-of-thought coding traces from nvidia/OpenCodeReasoning. It inherits Atem-Wisdom's explicit reasoning capability and applies it specifically to programming tasks — working through algorithm choice, edge cases, and complexity analysis before producing an implementation.

The core behaviour: when given a coding problem, the model reasons through it fully inside a <think> block before writing any code. This makes its reasoning auditable and reduces the frequency of structurally plausible but logically incorrect solutions.

When to choose Atem-SageCoder over Atem-Wisdom:

  • Programming problems where reasoning about approach matters before implementation
  • Competitive programming and algorithmic tasks
  • Situations where you want to see the model's design decisions, not just its output
  • Code that requires edge case analysis or complexity reasoning

When to choose Atem-Wisdom instead:

  • General reasoning, mathematics, and analytical tasks outside of coding
  • Mixed-domain workloads where code is one of many task types
  • Environments where output length is a constraint

The Atem Series

Model Stage Capability Status
Atem v1 Stage 1 — SFT Fast, direct reasoning
Atem-Wisdom Stage 2 — CoT Explicit thinking traces
Atem-SageCoder Specialisation — Code Think-then-code on algorithms
Atem-Pharaoh (planned) Stage 3 — DPO/IPO Preference-aligned reasoning 🔄

Atem-SageCoder is a domain-specialised branch off Atem-Wisdom, not a continuation of the main series progression toward Atem-Pharaoh.


Model Details

Property Value
Base model EphAsad/Atem-Wisdom-1.5B
Root architecture Qwen/Qwen2.5-1.5B-Instruct
Training method LoRA SFT — Code Reasoning Specialisation
LoRA config r=32, alpha=64, dropout=0.05
Parameters ~1.54B
Training records 15,427 (after filtering)
Think / no-think split 90% / 10%
Epochs 2
Total steps 484
Final train loss 0.8477
Final val loss 0.8591
Hardware NVIDIA A100-SXM4 80GB
Max sequence length 8,192 tokens
Precision bfloat16
License Apache 2.0

Output Format

Atem-SageCoder produces responses in one of two formats:

With reasoning trace (90% of training examples):

<think>
[Reasoning through the problem — algorithm selection, edge cases,
complexity analysis, implementation approach]
</think>

[Final implementation — clean, correct code with explanation]

Direct answer (simple queries):

[Concise code response — no reasoning trace needed]

The 10% no-think training pool prevents the model from refusing to answer simple queries without extended reasoning. On straightforward questions it responds directly; the think trace activates proportionally to problem complexity.


Training Data

Atem-SageCoder was trained on 15,427 examples drawn from nvidia/OpenCodeReasoning (split_0), after streaming 40,000 candidates and applying two sequential filters.

Filter 1 — Truncation gate: Records were rejected if </think> was absent from the output (CoT cut off mid-trace) or if fewer than 30 characters of code followed </think> (code truncated). This is the primary source of attrition — OpenCodeReasoning CoT traces are long, and 8,192 tokens captures roughly 38% of the raw stream.

Filter 2 — Bad input gate: Records with input fields under 20 characters were rejected. A known data quality issue in split_1 caused that entire split to be excluded; all training data comes from split_0.

Filter 3 — Token length: Examples exceeding 8,192 tokens after chat template application were removed rather than truncated.

Property Value
Dataset nvidia/OpenCodeReasoning (split_0)
Streamed 40,000
After truncation filter ~24,000
After token length filter 15,427
Train / Val split 14,627 / 800
Domain Competitive programming (algorithmic problems)

CoT extraction: The output column in OpenCodeReasoning contains <think>...</think>code format. CoT and code were extracted into separate fields before formatting. The <think> tags were removed from the raw output to avoid double-tag injection during chat template application, then manually reinserted during build_text construction with enable_thinking=False.

Loss curve:

Step Train Loss Val Loss
250 0.8564 0.8757
484 (final) 0.8477 0.8591

Train/val gap of 0.012 at completion — no overfitting signal. Loss values in the 0.85 range are expected for complex CoT+code targets; simple instruction SFT typically reaches 0.3–0.5, but verified reasoning traces carry genuine entropy.


Training Configuration

# Key hyperparameters
lora_r            = 32
lora_alpha        = 64
lora_dropout      = 0.05
max_seq_length    = 8192       # doubled vs Atem-Wisdom — CoT traces are long
learning_rate     = 1e-4
lr_scheduler      = 'cosine'
warmup_ratio      = 0.05
batch_size        = 4          # halved vs Atem-Wisdom to account for 2× seq length
grad_accumulation = 16         # effective batch size: 64
num_epochs        = 2
dtype             = bfloat16
load_in_4bit      = True       # during training
nothink_ratio     = 0.10       # 10% direct-answer training pool

Training used Unsloth (unsloth==2026.5.5, unsloth_zoo==2026.5.5) with train_on_responses_only masking. Loss was computed exclusively on assistant response tokens. A three-part pre-training validation was run before training: identity confirmation, double <think> tag detection, and mask sanity check. All checks passed before training was confirmed.


Evaluation

Qualitative Coding Evaluation (8 / 30 questions shown)

Atem-SageCoder was evaluated against a (Qwen/Qwen2.5-1.5B-Instruct) baseline across 30 coding questions covering implementation tasks, concept explanations, and algorithm design. The 8 coding-domain questions from that evaluation are shown below.

# Question Base SageCoder Notes
1 is_even(n) function ✓ No think ✓ Think Both correct
2 Count vowels in string ✓ No think ✓ Think SageCoder more Pythonic (generator expression)
3 List vs tuple differences ✓ No think ⚠ Think SageCoder error: claims tuples cannot contain duplicates (incorrect)
4 Sum list without sum() ✓ No think ✓ Think SageCoder more thorough, both correct
5 Reverse a string ✓ No think ✓ Think Both correct; SageCoder more verbose
6 if / elif / else ⚠ No think ✓ Think Base error: predicts wrong output for age=25 example
7 find_max() with empty list ✓ No think ✓ Think SageCoder provides two implementations
8 for vs while loop ✓ No think ✓ Think SageCoder more structured

Summary across 8 questions:

Metric Baseline Atem-SageCoder
Think traces 0 / 8 8 / 8
Avg response (words) ~177 ~470
Factual errors observed 1 (Q6 output prediction) 1 (Q3 tuple claim)
Code correctness 7 / 8 correct 7 / 8 correct

The think-then-code pattern activates consistently on all coding questions. Response depth increases significantly — SageCoder examines edge cases, considers multiple approaches, and explains implementation choices that the baseline omits. Overall correctness is comparable across these 8 questions; the error types differ (baseline: incorrect output prediction; SageCoder: incorrect concept claim about tuples).


Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "EphAsad/Atem-SageCoder-1.5B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": "Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=2048,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        do_sample=True,
    )

response = tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=True
)
print(response)

Unsloth (faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="EphAsad/Atem-SageCoder-1.5B",
    max_seq_length=8192,
    dtype=torch.bfloat16,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {
        "role": "user",
        "content": "Given an array of integers, find the two numbers that sum to a target value. Return their indices."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=2048,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))

Ollama

# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Atem-SageCoder-1.5B:Q4_K_M

# Higher quality
ollama run hf.co/EphAsad/Atem-SageCoder-1.5B:Q5_K_M

# Near-lossless
ollama run hf.co/EphAsad/Atem-SageCoder-1.5B:Q8_0

llama.cpp

llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M

System Prompt

Atem-SageCoder's identity and coding focus are baked into the chat template. To override manually:

You are Atem-SageCoder, a thoughtful programming assistant built on the
Atem foundation. You reason carefully through problems before writing code
— considering edge cases, algorithm choice, complexity, and implementation
details — then provide clean, correct, and well-structured implementations.

Available Files

File Size Description
model.safetensors ~3.1 GB Full bfloat16 merged weights
Atem-SageCoder-1.5B.Q4_K_M.gguf ~986 MB 4-bit quantised — recommended
Atem-SageCoder-1.5B.Q5_K_M.gguf ~1.1 GB 5-bit quantised
Atem-SageCoder-1.5B.Q8_0.gguf ~1.6 GB 8-bit quantised — near-lossless

Known Limitations

Training data scope. All 15,427 training examples come from competitive programming problems in nvidia/OpenCodeReasoning. The model is strongest on algorithmic and data structure problems; general software engineering tasks (web APIs, OOP design, framework-specific code) were not represented in training and may produce lower quality output.

Factual concept errors. The qualitative evaluation identified an incorrect claim about tuples (Q3: stated tuples cannot contain duplicates — they can). Concept explanation accuracy should be independently verified for correctness-critical applications.

Response length. Think traces substantially increase output length. This is a fundamental property of the think-then-code design, not a fixable defect. For latency-constrained environments, Atem-Wisdom-1.5B with direct prompting may be preferable.

Single language bias. OpenCodeReasoning solutions are predominantly Python. Performance on other languages has not been formally evaluated.

Small training set. 15,427 examples is a focused dataset. Coverage of less common algorithmic patterns may be shallow. The high filter attrition rate (40k streamed → 15.4k retained) reflects the strict quality bar applied, not a shortage of data — the full split_0 contains substantially more examples at lower sequence lengths.


Roadmap

Stage Status Description
Stage 1 — SFT ✅ Complete Atem v1 — direct reasoning foundation
Stage 2 — CoT SFT ✅ Complete Atem-Wisdom — thinking traces
Specialisation — Code ✅ Complete Atem-SageCoder — this model
Stage 3 — DPO/IPO 🔄 Planned Atem-Pharaoh — preference-aligned reasoning

Citation

@misc{atem_sagecoder_2026,
  author       = {Asad, Zain},
  title        = {Atem-SageCoder: A 1.5B Think-Then-Code Model
                  via Competitive Programming Trace Distillation},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/EphAsad/Atem-SageCoder-1.5B}},
}

License

Released under the Apache 2.0 License, consistent with the base model chain (Qwen2.5-1.5B-Instruct → Atem v1 → Atem-Wisdom → Atem-SageCoder).


Built independently by EphAsad

Downloads last month
225
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EphAsad/Atem-SageCoder-1.5B

Adapter
(4)
this model
Adapters
2 models

Dataset used to train EphAsad/Atem-SageCoder-1.5B