Instructions to use EphAsad/Atem-SageCoder-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EphAsad/Atem-SageCoder-1.5B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EphAsad/Atem-SageCoder-1.5B",
	filename="Atem-SageCoder-1.5B.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use EphAsad/Atem-SageCoder-1.5B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M

Use Docker

docker model run hf.co/EphAsad/Atem-SageCoder-1.5B:Q4_K_M

LM Studio
Jan

vLLM

How to use EphAsad/Atem-SageCoder-1.5B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EphAsad/Atem-SageCoder-1.5B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-SageCoder-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EphAsad/Atem-SageCoder-1.5B:Q4_K_M

Ollama
How to use EphAsad/Atem-SageCoder-1.5B with Ollama:
```
ollama run hf.co/EphAsad/Atem-SageCoder-1.5B:Q4_K_M
```

Unsloth Studio

How to use EphAsad/Atem-SageCoder-1.5B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Atem-SageCoder-1.5B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Atem-SageCoder-1.5B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EphAsad/Atem-SageCoder-1.5B to start chatting

How to use EphAsad/Atem-SageCoder-1.5B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "EphAsad/Atem-SageCoder-1.5B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use EphAsad/Atem-SageCoder-1.5B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default EphAsad/Atem-SageCoder-1.5B:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use EphAsad/Atem-SageCoder-1.5B with Docker Model Runner:
```
docker model run hf.co/EphAsad/Atem-SageCoder-1.5B:Q4_K_M
```

Lemonade

How to use EphAsad/Atem-SageCoder-1.5B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull EphAsad/Atem-SageCoder-1.5B:Q4_K_M

Run and chat with the model

lemonade run user.Atem-SageCoder-1.5B-Q4_K_M

List all available models

lemonade list

Atem-SageCoder

Ancient logic. Modern intelligence. Applied to code.

A 1.5B code reasoning model that thinks before it writes — trained on verified competitive programming traces from frontier models.

Overview

Atem-SageCoder is a code-specialised variant of Atem-Wisdom-1.5B, fine-tuned on verified chain-of-thought coding traces from nvidia/OpenCodeReasoning. It inherits Atem-Wisdom's explicit reasoning capability and applies it specifically to programming tasks — working through algorithm choice, edge cases, and complexity analysis before producing an implementation.

The core behaviour: when given a coding problem, the model reasons through it fully inside a <think> block before writing any code. This makes its reasoning auditable and reduces the frequency of structurally plausible but logically incorrect solutions.

When to choose Atem-SageCoder over Atem-Wisdom:

Programming problems where reasoning about approach matters before implementation
Competitive programming and algorithmic tasks
Situations where you want to see the model's design decisions, not just its output
Code that requires edge case analysis or complexity reasoning

When to choose Atem-Wisdom instead:

General reasoning, mathematics, and analytical tasks outside of coding
Mixed-domain workloads where code is one of many task types
Environments where output length is a constraint

The Atem Series

Model	Stage	Capability	Status
Atem v1	Stage 1 — SFT	Fast, direct reasoning	✅
Atem-Wisdom	Stage 2 — CoT	Explicit thinking traces	✅
Atem-SageCoder	Specialisation — Code	Think-then-code on algorithms	✅
Atem-Pharaoh (planned)	Stage 3 — DPO/IPO	Preference-aligned reasoning	🔄

Atem-SageCoder is a domain-specialised branch off Atem-Wisdom, not a continuation of the main series progression toward Atem-Pharaoh.

Model Details

Property	Value
Base model	EphAsad/Atem-Wisdom-1.5B
Root architecture	Qwen/Qwen2.5-1.5B-Instruct
Training method	LoRA SFT — Code Reasoning Specialisation
LoRA config	r=32, alpha=64, dropout=0.05
Parameters	~1.54B
Training records	15,427 (after filtering)
Think / no-think split	90% / 10%
Epochs	2
Total steps	484
Final train loss	0.8477
Final val loss	0.8591
Hardware	NVIDIA A100-SXM4 80GB
Max sequence length	8,192 tokens
Precision	bfloat16
License	Apache 2.0

Output Format

Atem-SageCoder produces responses in one of two formats:

With reasoning trace (90% of training examples):

<think>
[Reasoning through the problem — algorithm selection, edge cases,
complexity analysis, implementation approach]
</think>

[Final implementation — clean, correct code with explanation]

Direct answer (simple queries):

[Concise code response — no reasoning trace needed]

The 10% no-think training pool prevents the model from refusing to answer simple queries without extended reasoning. On straightforward questions it responds directly; the think trace activates proportionally to problem complexity.

Training Data

Atem-SageCoder was trained on 15,427 examples drawn from nvidia/OpenCodeReasoning (split_0), after streaming 40,000 candidates and applying two sequential filters.

Filter 1 — Truncation gate: Records were rejected if </think> was absent from the output (CoT cut off mid-trace) or if fewer than 30 characters of code followed </think> (code truncated). This is the primary source of attrition — OpenCodeReasoning CoT traces are long, and 8,192 tokens captures roughly 38% of the raw stream.

Filter 2 — Bad input gate: Records with input fields under 20 characters were rejected. A known data quality issue in split_1 caused that entire split to be excluded; all training data comes from split_0.

Filter 3 — Token length: Examples exceeding 8,192 tokens after chat template application were removed rather than truncated.

Property	Value
Dataset	nvidia/OpenCodeReasoning (split_0)
Streamed	40,000
After truncation filter	~24,000
After token length filter	15,427
Train / Val split	14,627 / 800
Domain	Competitive programming (algorithmic problems)

CoT extraction: The output column in OpenCodeReasoning contains <think>...</think>code format. CoT and code were extracted into separate fields before formatting. The <think> tags were removed from the raw output to avoid double-tag injection during chat template application, then manually reinserted during build_text construction with enable_thinking=False.

Loss curve:

Step	Train Loss	Val Loss
250	0.8564	0.8757
484 (final)	0.8477	0.8591

Train/val gap of 0.012 at completion — no overfitting signal. Loss values in the 0.85 range are expected for complex CoT+code targets; simple instruction SFT typically reaches 0.3–0.5, but verified reasoning traces carry genuine entropy.

Training Configuration

# Key hyperparameters
lora_r            = 32
lora_alpha        = 64
lora_dropout      = 0.05
max_seq_length    = 8192       # doubled vs Atem-Wisdom — CoT traces are long
learning_rate     = 1e-4
lr_scheduler      = 'cosine'
warmup_ratio      = 0.05
batch_size        = 4          # halved vs Atem-Wisdom to account for 2× seq length
grad_accumulation = 16         # effective batch size: 64
num_epochs        = 2
dtype             = bfloat16
load_in_4bit      = True       # during training
nothink_ratio     = 0.10       # 10% direct-answer training pool

Training used Unsloth (unsloth==2026.5.5, unsloth_zoo==2026.5.5) with train_on_responses_only masking. Loss was computed exclusively on assistant response tokens. A three-part pre-training validation was run before training: identity confirmation, double <think> tag detection, and mask sanity check. All checks passed before training was confirmed.

Evaluation

Qualitative Coding Evaluation (8 / 30 questions shown)

Atem-SageCoder was evaluated against a (Qwen/Qwen2.5-1.5B-Instruct) baseline across 30 coding questions covering implementation tasks, concept explanations, and algorithm design. The 8 coding-domain questions from that evaluation are shown below.

#	Question	Base	SageCoder	Notes
1	`is_even(n)` function	✓ No think	✓ Think	Both correct
2	Count vowels in string	✓ No think	✓ Think	SageCoder more Pythonic (generator expression)
3	List vs tuple differences	✓ No think	⚠ Think	SageCoder error: claims tuples cannot contain duplicates (incorrect)
4	Sum list without `sum()`	✓ No think	✓ Think	SageCoder more thorough, both correct
5	Reverse a string	✓ No think	✓ Think	Both correct; SageCoder more verbose
6	`if` / `elif` / `else`	⚠ No think	✓ Think	Base error: predicts wrong output for age=25 example
7	`find_max()` with empty list	✓ No think	✓ Think	SageCoder provides two implementations
8	`for` vs `while` loop	✓ No think	✓ Think	SageCoder more structured

Summary across 8 questions:

Metric	Baseline	Atem-SageCoder
Think traces	0 / 8	8 / 8
Avg response (words)	~177	~470
Factual errors observed	1 (Q6 output prediction)	1 (Q3 tuple claim)
Code correctness	7 / 8 correct	7 / 8 correct

The think-then-code pattern activates consistently on all coding questions. Response depth increases significantly — SageCoder examines edge cases, considers multiple approaches, and explains implementation choices that the baseline omits. Overall correctness is comparable across these 8 questions; the error types differ (baseline: incorrect output prediction; SageCoder: incorrect concept claim about tuples).

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "EphAsad/Atem-SageCoder-1.5B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": "Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=2048,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        do_sample=True,
    )

response = tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=True
)
print(response)

Unsloth (faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="EphAsad/Atem-SageCoder-1.5B",
    max_seq_length=8192,
    dtype=torch.bfloat16,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {
        "role": "user",
        "content": "Given an array of integers, find the two numbers that sum to a target value. Return their indices."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=2048,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))

Ollama

# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Atem-SageCoder-1.5B:Q4_K_M

# Higher quality
ollama run hf.co/EphAsad/Atem-SageCoder-1.5B:Q5_K_M

# Near-lossless
ollama run hf.co/EphAsad/Atem-SageCoder-1.5B:Q8_0

llama.cpp

llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M

System Prompt

Atem-SageCoder's identity and coding focus are baked into the chat template. To override manually:

You are Atem-SageCoder, a thoughtful programming assistant built on the
Atem foundation. You reason carefully through problems before writing code
— considering edge cases, algorithm choice, complexity, and implementation
details — then provide clean, correct, and well-structured implementations.

Available Files

File	Size	Description
`model.safetensors`	~3.1 GB	Full bfloat16 merged weights
`Atem-SageCoder-1.5B.Q4_K_M.gguf`	~986 MB	4-bit quantised — recommended
`Atem-SageCoder-1.5B.Q5_K_M.gguf`	~1.1 GB	5-bit quantised
`Atem-SageCoder-1.5B.Q8_0.gguf`	~1.6 GB	8-bit quantised — near-lossless

Known Limitations

Training data scope. All 15,427 training examples come from competitive programming problems in nvidia/OpenCodeReasoning. The model is strongest on algorithmic and data structure problems; general software engineering tasks (web APIs, OOP design, framework-specific code) were not represented in training and may produce lower quality output.

Factual concept errors. The qualitative evaluation identified an incorrect claim about tuples (Q3: stated tuples cannot contain duplicates — they can). Concept explanation accuracy should be independently verified for correctness-critical applications.

Response length. Think traces substantially increase output length. This is a fundamental property of the think-then-code design, not a fixable defect. For latency-constrained environments, Atem-Wisdom-1.5B with direct prompting may be preferable.

Single language bias. OpenCodeReasoning solutions are predominantly Python. Performance on other languages has not been formally evaluated.

Small training set. 15,427 examples is a focused dataset. Coverage of less common algorithmic patterns may be shallow. The high filter attrition rate (40k streamed → 15.4k retained) reflects the strict quality bar applied, not a shortage of data — the full split_0 contains substantially more examples at lower sequence lengths.

Roadmap

Stage	Status	Description
Stage 1 — SFT	✅ Complete	Atem v1 — direct reasoning foundation
Stage 2 — CoT SFT	✅ Complete	Atem-Wisdom — thinking traces
Specialisation — Code	✅ Complete	Atem-SageCoder — this model
Stage 3 — DPO/IPO	🔄 Planned	Atem-Pharaoh — preference-aligned reasoning

Citation

@misc{atem_sagecoder_2026,
  author       = {Asad, Zain},
  title        = {Atem-SageCoder: A 1.5B Think-Then-Code Model
                  via Competitive Programming Trace Distillation},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/EphAsad/Atem-SageCoder-1.5B}},
}

License

Released under the Apache 2.0 License, consistent with the base model chain (Qwen2.5-1.5B-Instruct → Atem v1 → Atem-Wisdom → Atem-SageCoder).

Built independently by EphAsad

Downloads last month: 225

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for EphAsad/Atem-SageCoder-1.5B

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

EphAsad/Atem-v1-1.5B

Adapter

EphAsad/Atem-Wisdom-1.5B

Adapter

(4)

this model

Adapters

2 models

EphAsad
/

Atem-SageCoder-1.5B

Atem-SageCoder

Overview

The Atem Series

Model Details

Output Format

Training Data

Training Configuration

Evaluation

Qualitative Coding Evaluation (8 / 30 questions shown)

Usage

Transformers

Unsloth (faster inference)

Ollama

llama.cpp

System Prompt

Available Files

Known Limitations

Roadmap

Citation

License

Model tree for EphAsad/Atem-SageCoder-1.5B

Dataset used to train EphAsad/Atem-SageCoder-1.5B