Instructions to use EphAsad/Atem-SageCoder-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use EphAsad/Atem-SageCoder-1.5B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="EphAsad/Atem-SageCoder-1.5B", filename="Atem-SageCoder-1.5B.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use EphAsad/Atem-SageCoder-1.5B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M
Use Docker
docker model run hf.co/EphAsad/Atem-SageCoder-1.5B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use EphAsad/Atem-SageCoder-1.5B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "EphAsad/Atem-SageCoder-1.5B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Atem-SageCoder-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/EphAsad/Atem-SageCoder-1.5B:Q4_K_M
- Ollama
How to use EphAsad/Atem-SageCoder-1.5B with Ollama:
ollama run hf.co/EphAsad/Atem-SageCoder-1.5B:Q4_K_M
- Unsloth Studio
How to use EphAsad/Atem-SageCoder-1.5B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EphAsad/Atem-SageCoder-1.5B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EphAsad/Atem-SageCoder-1.5B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for EphAsad/Atem-SageCoder-1.5B to start chatting
- Pi
How to use EphAsad/Atem-SageCoder-1.5B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "EphAsad/Atem-SageCoder-1.5B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use EphAsad/Atem-SageCoder-1.5B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default EphAsad/Atem-SageCoder-1.5B:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use EphAsad/Atem-SageCoder-1.5B with Docker Model Runner:
docker model run hf.co/EphAsad/Atem-SageCoder-1.5B:Q4_K_M
- Lemonade
How to use EphAsad/Atem-SageCoder-1.5B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull EphAsad/Atem-SageCoder-1.5B:Q4_K_M
Run and chat with the model
lemonade run user.Atem-SageCoder-1.5B-Q4_K_M
List all available models
lemonade list
Atem-SageCoder
Ancient logic. Modern intelligence. Applied to code.
A 1.5B code reasoning model that thinks before it writes — trained on verified competitive programming traces from frontier models.
Overview
Atem-SageCoder is a code-specialised variant of Atem-Wisdom-1.5B, fine-tuned on verified chain-of-thought coding traces from nvidia/OpenCodeReasoning. It inherits Atem-Wisdom's explicit reasoning capability and applies it specifically to programming tasks — working through algorithm choice, edge cases, and complexity analysis before producing an implementation.
The core behaviour: when given a coding problem, the model reasons through it fully inside a <think> block before writing any code. This makes its reasoning auditable and reduces the frequency of structurally plausible but logically incorrect solutions.
When to choose Atem-SageCoder over Atem-Wisdom:
- Programming problems where reasoning about approach matters before implementation
- Competitive programming and algorithmic tasks
- Situations where you want to see the model's design decisions, not just its output
- Code that requires edge case analysis or complexity reasoning
When to choose Atem-Wisdom instead:
- General reasoning, mathematics, and analytical tasks outside of coding
- Mixed-domain workloads where code is one of many task types
- Environments where output length is a constraint
The Atem Series
| Model | Stage | Capability | Status |
|---|---|---|---|
| Atem v1 | Stage 1 — SFT | Fast, direct reasoning | ✅ |
| Atem-Wisdom | Stage 2 — CoT | Explicit thinking traces | ✅ |
| Atem-SageCoder | Specialisation — Code | Think-then-code on algorithms | ✅ |
| Atem-Pharaoh (planned) | Stage 3 — DPO/IPO | Preference-aligned reasoning | 🔄 |
Atem-SageCoder is a domain-specialised branch off Atem-Wisdom, not a continuation of the main series progression toward Atem-Pharaoh.
Model Details
| Property | Value |
|---|---|
| Base model | EphAsad/Atem-Wisdom-1.5B |
| Root architecture | Qwen/Qwen2.5-1.5B-Instruct |
| Training method | LoRA SFT — Code Reasoning Specialisation |
| LoRA config | r=32, alpha=64, dropout=0.05 |
| Parameters | ~1.54B |
| Training records | 15,427 (after filtering) |
| Think / no-think split | 90% / 10% |
| Epochs | 2 |
| Total steps | 484 |
| Final train loss | 0.8477 |
| Final val loss | 0.8591 |
| Hardware | NVIDIA A100-SXM4 80GB |
| Max sequence length | 8,192 tokens |
| Precision | bfloat16 |
| License | Apache 2.0 |
Output Format
Atem-SageCoder produces responses in one of two formats:
With reasoning trace (90% of training examples):
<think>
[Reasoning through the problem — algorithm selection, edge cases,
complexity analysis, implementation approach]
</think>
[Final implementation — clean, correct code with explanation]
Direct answer (simple queries):
[Concise code response — no reasoning trace needed]
The 10% no-think training pool prevents the model from refusing to answer simple queries without extended reasoning. On straightforward questions it responds directly; the think trace activates proportionally to problem complexity.
Training Data
Atem-SageCoder was trained on 15,427 examples drawn from nvidia/OpenCodeReasoning (split_0), after streaming 40,000 candidates and applying two sequential filters.
Filter 1 — Truncation gate: Records were rejected if </think> was absent from the output (CoT cut off mid-trace) or if fewer than 30 characters of code followed </think> (code truncated). This is the primary source of attrition — OpenCodeReasoning CoT traces are long, and 8,192 tokens captures roughly 38% of the raw stream.
Filter 2 — Bad input gate: Records with input fields under 20 characters were rejected. A known data quality issue in split_1 caused that entire split to be excluded; all training data comes from split_0.
Filter 3 — Token length: Examples exceeding 8,192 tokens after chat template application were removed rather than truncated.
| Property | Value |
|---|---|
| Dataset | nvidia/OpenCodeReasoning (split_0) |
| Streamed | 40,000 |
| After truncation filter | ~24,000 |
| After token length filter | 15,427 |
| Train / Val split | 14,627 / 800 |
| Domain | Competitive programming (algorithmic problems) |
CoT extraction: The output column in OpenCodeReasoning contains <think>...</think>code format. CoT and code were extracted into separate fields before formatting. The <think> tags were removed from the raw output to avoid double-tag injection during chat template application, then manually reinserted during build_text construction with enable_thinking=False.
Loss curve:
| Step | Train Loss | Val Loss |
|---|---|---|
| 250 | 0.8564 | 0.8757 |
| 484 (final) | 0.8477 | 0.8591 |
Train/val gap of 0.012 at completion — no overfitting signal. Loss values in the 0.85 range are expected for complex CoT+code targets; simple instruction SFT typically reaches 0.3–0.5, but verified reasoning traces carry genuine entropy.
Training Configuration
# Key hyperparameters
lora_r = 32
lora_alpha = 64
lora_dropout = 0.05
max_seq_length = 8192 # doubled vs Atem-Wisdom — CoT traces are long
learning_rate = 1e-4
lr_scheduler = 'cosine'
warmup_ratio = 0.05
batch_size = 4 # halved vs Atem-Wisdom to account for 2× seq length
grad_accumulation = 16 # effective batch size: 64
num_epochs = 2
dtype = bfloat16
load_in_4bit = True # during training
nothink_ratio = 0.10 # 10% direct-answer training pool
Training used Unsloth (unsloth==2026.5.5, unsloth_zoo==2026.5.5) with train_on_responses_only masking. Loss was computed exclusively on assistant response tokens. A three-part pre-training validation was run before training: identity confirmation, double <think> tag detection, and mask sanity check. All checks passed before training was confirmed.
Evaluation
Qualitative Coding Evaluation (8 / 30 questions shown)
Atem-SageCoder was evaluated against a (Qwen/Qwen2.5-1.5B-Instruct) baseline across 30 coding questions covering implementation tasks, concept explanations, and algorithm design. The 8 coding-domain questions from that evaluation are shown below.
| # | Question | Base | SageCoder | Notes |
|---|---|---|---|---|
| 1 | is_even(n) function |
✓ No think | ✓ Think | Both correct |
| 2 | Count vowels in string | ✓ No think | ✓ Think | SageCoder more Pythonic (generator expression) |
| 3 | List vs tuple differences | ✓ No think | ⚠ Think | SageCoder error: claims tuples cannot contain duplicates (incorrect) |
| 4 | Sum list without sum() |
✓ No think | ✓ Think | SageCoder more thorough, both correct |
| 5 | Reverse a string | ✓ No think | ✓ Think | Both correct; SageCoder more verbose |
| 6 | if / elif / else |
⚠ No think | ✓ Think | Base error: predicts wrong output for age=25 example |
| 7 | find_max() with empty list |
✓ No think | ✓ Think | SageCoder provides two implementations |
| 8 | for vs while loop |
✓ No think | ✓ Think | SageCoder more structured |
Summary across 8 questions:
| Metric | Baseline | Atem-SageCoder |
|---|---|---|
| Think traces | 0 / 8 | 8 / 8 |
| Avg response (words) | ~177 | ~470 |
| Factual errors observed | 1 (Q6 output prediction) | 1 (Q3 tuple claim) |
| Code correctness | 7 / 8 correct | 7 / 8 correct |
The think-then-code pattern activates consistently on all coding questions. Response depth increases significantly — SageCoder examines edge cases, considers multiple approaches, and explains implementation choices that the baseline omits. Overall correctness is comparable across these 8 questions; the error types differ (baseline: incorrect output prediction; SageCoder: incorrect concept claim about tuples).
Usage
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "EphAsad/Atem-SageCoder-1.5B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{
"role": "user",
"content": "Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes."
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
output = model.generate(
input_ids=inputs,
max_new_tokens=2048,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1,
do_sample=True,
)
response = tokenizer.decode(
output[0][inputs.shape[1]:],
skip_special_tokens=True
)
print(response)
Unsloth (faster inference)
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="EphAsad/Atem-SageCoder-1.5B",
max_seq_length=8192,
dtype=torch.bfloat16,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
messages = [
{
"role": "user",
"content": "Given an array of integers, find the two numbers that sum to a target value. Return their indices."
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to("cuda")
with torch.no_grad():
output = model.generate(
input_ids=inputs,
max_new_tokens=2048,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))
Ollama
# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Atem-SageCoder-1.5B:Q4_K_M
# Higher quality
ollama run hf.co/EphAsad/Atem-SageCoder-1.5B:Q5_K_M
# Near-lossless
ollama run hf.co/EphAsad/Atem-SageCoder-1.5B:Q8_0
llama.cpp
llama-server -hf EphAsad/Atem-SageCoder-1.5B:Q4_K_M
System Prompt
Atem-SageCoder's identity and coding focus are baked into the chat template. To override manually:
You are Atem-SageCoder, a thoughtful programming assistant built on the
Atem foundation. You reason carefully through problems before writing code
— considering edge cases, algorithm choice, complexity, and implementation
details — then provide clean, correct, and well-structured implementations.
Available Files
| File | Size | Description |
|---|---|---|
model.safetensors |
~3.1 GB | Full bfloat16 merged weights |
Atem-SageCoder-1.5B.Q4_K_M.gguf |
~986 MB | 4-bit quantised — recommended |
Atem-SageCoder-1.5B.Q5_K_M.gguf |
~1.1 GB | 5-bit quantised |
Atem-SageCoder-1.5B.Q8_0.gguf |
~1.6 GB | 8-bit quantised — near-lossless |
Known Limitations
Training data scope. All 15,427 training examples come from competitive programming problems in nvidia/OpenCodeReasoning. The model is strongest on algorithmic and data structure problems; general software engineering tasks (web APIs, OOP design, framework-specific code) were not represented in training and may produce lower quality output.
Factual concept errors. The qualitative evaluation identified an incorrect claim about tuples (Q3: stated tuples cannot contain duplicates — they can). Concept explanation accuracy should be independently verified for correctness-critical applications.
Response length. Think traces substantially increase output length. This is a fundamental property of the think-then-code design, not a fixable defect. For latency-constrained environments, Atem-Wisdom-1.5B with direct prompting may be preferable.
Single language bias. OpenCodeReasoning solutions are predominantly Python. Performance on other languages has not been formally evaluated.
Small training set. 15,427 examples is a focused dataset. Coverage of less common algorithmic patterns may be shallow. The high filter attrition rate (40k streamed → 15.4k retained) reflects the strict quality bar applied, not a shortage of data — the full split_0 contains substantially more examples at lower sequence lengths.
Roadmap
| Stage | Status | Description |
|---|---|---|
| Stage 1 — SFT | ✅ Complete | Atem v1 — direct reasoning foundation |
| Stage 2 — CoT SFT | ✅ Complete | Atem-Wisdom — thinking traces |
| Specialisation — Code | ✅ Complete | Atem-SageCoder — this model |
| Stage 3 — DPO/IPO | 🔄 Planned | Atem-Pharaoh — preference-aligned reasoning |
Citation
@misc{atem_sagecoder_2026,
author = {Asad, Zain},
title = {Atem-SageCoder: A 1.5B Think-Then-Code Model
via Competitive Programming Trace Distillation},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/EphAsad/Atem-SageCoder-1.5B}},
}
License
Released under the Apache 2.0 License, consistent with the base model chain (Qwen2.5-1.5B-Instruct → Atem v1 → Atem-Wisdom → Atem-SageCoder).
Built independently by EphAsad
- Downloads last month
- 225
Model tree for EphAsad/Atem-SageCoder-1.5B
Base model
Qwen/Qwen2.5-1.5B