Instructions to use EphAsad/Atem-1.7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EphAsad/Atem-1.7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="EphAsad/Atem-1.7B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("EphAsad/Atem-1.7B") model = AutoModelForCausalLM.from_pretrained("EphAsad/Atem-1.7B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use EphAsad/Atem-1.7B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="EphAsad/Atem-1.7B", filename="Atem-1.7B.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use EphAsad/Atem-1.7B with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf EphAsad/Atem-1.7B:Q4_K_M # Run inference directly in the terminal: llama cli -hf EphAsad/Atem-1.7B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf EphAsad/Atem-1.7B:Q4_K_M # Run inference directly in the terminal: llama cli -hf EphAsad/Atem-1.7B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf EphAsad/Atem-1.7B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf EphAsad/Atem-1.7B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf EphAsad/Atem-1.7B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf EphAsad/Atem-1.7B:Q4_K_M
Use Docker
docker model run hf.co/EphAsad/Atem-1.7B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use EphAsad/Atem-1.7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "EphAsad/Atem-1.7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Atem-1.7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/EphAsad/Atem-1.7B:Q4_K_M
- SGLang
How to use EphAsad/Atem-1.7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "EphAsad/Atem-1.7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Atem-1.7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "EphAsad/Atem-1.7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Atem-1.7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use EphAsad/Atem-1.7B with Ollama:
ollama run hf.co/EphAsad/Atem-1.7B:Q4_K_M
- Unsloth Studio
How to use EphAsad/Atem-1.7B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EphAsad/Atem-1.7B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EphAsad/Atem-1.7B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for EphAsad/Atem-1.7B to start chatting
- Pi
How to use EphAsad/Atem-1.7B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf EphAsad/Atem-1.7B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "EphAsad/Atem-1.7B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use EphAsad/Atem-1.7B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf EphAsad/Atem-1.7B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default EphAsad/Atem-1.7B:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use EphAsad/Atem-1.7B with Docker Model Runner:
docker model run hf.co/EphAsad/Atem-1.7B:Q4_K_M
- Lemonade
How to use EphAsad/Atem-1.7B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull EphAsad/Atem-1.7B:Q4_K_M
Run and chat with the model
lemonade run user.Atem-1.7B-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)Atem-1.7B
Ancient logic. Modern intelligence.
A 1.7B reasoning model trained via a single CoT-preserving SFT pass directly on Qwen3-1.7B, distilling multi-domain reasoning capability from frontier teacher models while keeping the base model's native thinking capability intact.
Overview
Atem-1.7B is a 1.7B parameter reasoning model built via a single supervised fine-tuning pass on raw Qwen3-1.7B, using the same CoT-preserving single-pass design as Atem-4B and Atem-8B. It is the most compute-efficient model in the Atem series, completing training in under 2.5 hours on an A100-SXM4 80GB while maintaining 2.95% proportional LoRA capacity — close to the series-wide 3% target.
This model includes GSM8K-format training examples (5K no-think records) to partially restore the #### answer convention that the reasoning corpus otherwise overwrites — an improvement over Atem-4B and Atem-8B, which did not include these.
Model Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3-1.7B |
| Training method | Single-pass CoT-Preserving LoRA SFT |
| LoRA config | r=48, alpha=96, dropout=0.05 |
| Target modules | q, k, v, o, gate, up, down projections |
| Parameters | ~1.77B |
| Trainable (LoRA) params | 52,297,728 (2.95% of base) |
| Training records | 62,301 (after token-length filtering) |
| Think / No-think split | 85% / 15% |
| Epochs | 2 (ceiling; early stopping patience=3, never triggered) |
| Effective batch size | 64 (batch 16 × grad accum 4) |
| Learning rate | 1e-4, cosine schedule, 5% warmup |
| Max sequence length | 6,144 tokens |
| Precision | bfloat16 (full 16-bit LoRA, not QLoRA) |
| Hardware | NVIDIA A100-SXM4 80GB |
| Runtime | 2h28m |
| License | Apache 2.0 |
Design Notes
Single combined pass. The same single CoT-preserving pass design used across Atem-4B and Atem-8B — no erase-then-rebuild pipeline. Reasoning capability is built directly on the base model's intact native foundation.
r=48 for proportional capacity. r=32 on a 1.7B model represents only 2.05% of the model's parameters — the same shrinking-fraction problem observed across the series as model size grows. r=48 recovers 2.95% proportional capacity, close to the series-wide ~3% target and significantly better than r=32 would have provided.
GSM8K format restoration. The standard Atem training corpus uses \boxed{} notation throughout. Atem-4B and Atem-8B both showed a systematic GSM8K strict-match regression as a result of this format shift. Atem-1.7B is the first in the series to include 5,000 GSM8K-format training examples (from openai/gsm8k) in the no-think pool, partially re-establishing the #### answer convention alongside \boxed{}.
Full 16-bit LoRA. At 1.7B the model weights occupy only ~3.4GB, leaving over 75GB of A100 headroom. Full 16-bit LoRA is used throughout — faster and marginally more accurate than QLoRA without any VRAM constraint.
Intended Use
Atem-1.7B is suited for reasoning tasks on resource-constrained hardware — edge devices, local deployment, and applications where a 4B+ model is impractical:
- Multi-step mathematical reasoning
- Code explanation, implementation, and debugging
- Analytical reasoning across diverse domains
- Commonsense reasoning and physical intuition
- Logic and argument evaluation
For higher capability at the cost of resource requirements, Atem-4B and Atem-8B provide progressively stronger results on the same reasoning tasks.
Training Data
Atem-1.7B was trained on the same eight-source reasoning corpus as Atem-4B and Atem-8B, with the addition of 5,000 GSM8K-format records to partially restore the #### answer convention. All sources include explicit chain-of-thought reasoning traces; 85% of training records were formatted with full think traces and 15% as direct answers.
| Dataset | Records | Source / Teacher |
|---|---|---|
| mitroitskii/OpenR1-Math-220k-formatted | ~10,938 | DeepSeek-R1 — Mathematics (correctness-filtered) |
| Jackrong/Claude-opus-4.6-TraceInversion-9000x | 7,000 | Claude Opus 4.6 — Trace Inversion |
| Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned (General-Math) | 8,000 | Kimi K2.5 — Mathematical Reasoning |
| Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned (General-Distillation) | 8,000 | Kimi K2.5 — General Reasoning |
| Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned (PHD-Science) | 8,000 | Kimi K2.5 — Scientific Reasoning |
| WithinUsAI/MiniMax_M2.7_Distilled_5k | 5,000 | MiniMax M2.7 |
| FreedomIntelligence/medical-o1-reasoning-SFT | 7,500 | Medical reasoning (English config) |
| Modotte/CodeX-2M-Thinking | 15,000 | Mixed — Coding with CoT |
| trjxter/DeepSeek-V4-Pro-Reasoning-8000x | ~8,014 | DeepSeek-V4-Pro |
| nvidia/OpenCodeReasoning | 15,000 | Mixed — Competitive coding |
| openai/gsm8k (no-think) | 5,000 | GSM8K #### answer format restoration |
| Total (pre-filter pool) | 96,017 | |
| Total (post-filter, trained on) | 62,301 |
Non-English reasoning traces (primarily CJK) were filtered at the trace level using an ASCII-ratio threshold and retained as no-think records. The 34.3% filter rate is consistent with Atem-4B (32.7%) and Atem-8B (34.3%) at the same 6,144-token ceiling.
Training Configuration
# Key hyperparameters
lora_r = 48
lora_alpha = 96
lora_dropout = 0.05
max_seq_length = 6144
learning_rate = 1e-4
lr_scheduler = 'cosine'
warmup_ratio = 0.05
batch_size = 16
grad_accumulation = 4 # effective batch size: 64
num_epochs = 2 # ceiling — early stopping patience=3
eval_steps = 150
early_stopping_patience = 3
early_stopping_threshold = 0.001
nothink_ratio = 0.15
load_in_4bit = False # full 16-bit LoRA
dtype = bfloat16
Loss Curve
| Step | Train Loss | Val Loss |
|---|---|---|
| 150 | 1.0706 | 1.0833 |
| 300 | 1.0385 | 1.0520 |
| 450 | 1.0566 | 1.0372 |
| 600 | 0.9990 | 1.0255 |
| 750 | 1.0082 | 1.0158 |
| 900 | 0.9887 | 1.0091 |
| 1050 | 0.9294 | 1.0051 |
| 1200 | 0.8906 | 1.0020 |
| 1350 | 0.9331 | 0.9993 |
| 1500 | 0.9780 | 0.9973 |
| 1650 | 0.9467 | 0.9963 |
| 1800 | 0.9341 | 0.9957 |
| Final (1948) | 0.9902 (avg) | 0.9956 |
Train loss is noisier than in larger Atem models — characteristic of smaller models with a diverse multi-domain corpus. Validation loss improved monotonically across all 13 checkpoints without exception. Early stopping was configured but never triggered.
Evaluation
Benchmark Results
Evaluated against base Qwen3-1.7B (Qwen/Qwen3-1.7B) using lm-evaluation-harness. Both models were loaded in 4-bit for evaluation. Statistical significance (σ) is provided as context for interpreting each result — at 1.7B scale, several deltas that appear directionally positive are within sampling noise due to test set size.
| Task | Base (Qwen3-1.7B) | Atem-1.7B | Delta | σ |
|---|---|---|---|---|
| ARC-Challenge (0-shot, acc_norm) | 40.7% | 42.2% | +1.5pp ✓ | 0.7σ |
| GSM8K strict (5-shot, exact_match) | 62.0% | 58.7% | −3.3pp ⚠ | 1.7σ |
| HellaSwag (0-shot, acc_norm) | 59.4% | 61.3% | +1.9pp ✓ | 2.8σ |
| MMLU (0-shot, acc) | 55.4% | 56.2% | +0.8pp ✓ | 1.3σ |
| Winogrande (0-shot, acc) | 61.8% | 61.1% | −0.7pp ⚠ | 0.4σ |
| PIQA (0-shot, acc) | 71.4% | 71.4% | +0.0pp — | 0.0σ |
| OpenBookQA (0-shot, acc_norm) | 36.0% | 39.0% | +3.0pp ✓ | 1.0σ |
| BoolQ (0-shot, acc) | 76.5% | 76.0% | −0.5pp — | 0.5σ |
HellaSwag (+1.9pp, 2.8σ) is the only clearly statistically significant positive result. It uses normalised log-likelihood scoring over multiple-choice options — format-independent and not influenced by generation style. This is also the most consistent signal across the full Atem series (1.7B: +1.9pp, 4B: +2.9pp, 8B: +1.7pp), confirming genuine commonsense reasoning transfer from the CoT training corpus.
OpenBookQA (+3.0pp) is directionally strong but the test set is only 500 questions, giving 1.0σ — treat this as encouraging rather than conclusive.
Winogrande (−0.7pp, ⚠) despite the flag is 0.4σ and statistically indistinguishable from noise. Not a meaningful regression.
MMLU (+0.8pp, 1.3σ) is borderline. Consistent with the series pattern — neither model has a knowledge breadth advantage after CoT training.
Results at 1.7B are generally less pronounced than at 4B and 8B, as expected: smaller models with proportionally larger parameter changes per training step exhibit noisier benchmark behaviour, and the absolute capability headroom above random baselines is narrower.
GSM8K — Formatting Shift
The strict-match regression (−3.3pp) follows the same pattern established at 4B and 8B: the training corpus uses \boxed{} notation, systematically shifting away from the #### format that lm_eval's strict-match extraction expects. At 1.7B the base model scores 62.0% — above the threshold where formatting effects dominate over raw capability gains (the 0.6B base at 26.7% was below this threshold and actually improved on strict-match).
Atem-1.7B is the first model in the series to include GSM8K-format (#### answer) training examples. At 5,000 records out of 62,301 total (8%), this partially offsets the shift but does not eliminate it — larger proportions would be needed for full recovery. Based on the flexible-extraction recovery rate confirmed at 8B (68% of regression recovered), the estimated true capability gap is approximately −1.1pp rather than −3.3pp.
Usage
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "EphAsad/Atem-1.7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{
"role": "user",
"content": "Explain why the harmonic mean is used for average speeds rather than the arithmetic mean."
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
output = model.generate(
input_ids=inputs,
max_new_tokens=2000,
temperature=0.6,
top_p=0.95,
top_k=20,
do_sample=True,
repetition_penalty=1.1,
)
response = tokenizer.decode(
output[0][inputs.shape[1]:],
skip_special_tokens=True
)
print(response)
Unsloth (faster inference)
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="EphAsad/Atem-1.7B",
max_seq_length=6144,
dtype=torch.bfloat16,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
messages = [
{
"role": "user",
"content": "What is the time complexity of merge sort and why?"
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to("cuda")
with torch.no_grad():
output = model.generate(
input_ids=inputs,
max_new_tokens=2000,
temperature=0.6,
top_p=0.95,
top_k=20,
do_sample=True,
)
print(tokenizer.decode(
output[0][inputs.shape[1]:],
skip_special_tokens=True
))
Ollama
# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Atem-1.7B:Q4_K_M
# Higher quality
ollama run hf.co/EphAsad/Atem-1.7B:Q5_K_M
# Near-lossless
ollama run hf.co/EphAsad/Atem-1.7B:Q8_0
llama.cpp
llama-server -hf EphAsad/Atem-1.7B:Q4_K_M
Sampling Parameters
Use temperature=0.6, top_p=0.95, top_k=20 — Qwen3's published recommendation for thinking mode. Do not use greedy decoding with thinking mode enabled.
System Prompt
Atem-1.7B's identity is baked into the chat template and activates automatically without an explicit system message. For manual override:
You are Atem, a precise and analytical reasoning assistant. You approach
every problem methodically — identifying core concepts, reasoning step by
step, and arriving at well-supported conclusions. You show your thinking
clearly and are thorough, direct, and intellectually honest.
Available Files
| File | Size | Description |
|---|---|---|
model.safetensors |
3.44 GB | Full bfloat16 merged weights (single shard) |
Atem-1.7b.Q4_K_M.gguf |
1.11 GB | 4-bit quantised — recommended |
Atem-1.7b.Q5_K_M.gguf |
1.26 GB | 5-bit quantised |
Atem-1.7b.Q8_0.gguf |
1.83 GB | 8-bit quantised — near-lossless |
Known Limitations
GSM8K formatting shift. As documented in the evaluation section, the training corpus uses \boxed{} for mathematical answers. Despite the inclusion of 5,000 GSM8K-format examples, the strict-match regression persists at −3.3pp. The estimated true capability gap under flexible extraction is approximately −1.1pp. Future runs with a higher proportion of GSM8K-format examples would reduce this further.
Statistical modesty at 1.7B. Most benchmark deltas at this scale are within sampling noise — HellaSwag is the exception (2.8σ). This is expected: 1.7B models have narrower performance headroom and proportionally larger variance per benchmark question. The reasoning improvements are real but harder to detect reliably at smaller scale.
6,144 token sequence ceiling. The longest reasoning traces (advanced mathematics, competitive programming) were dropped during formatting. The model has not been trained on very long chain-of-thought traces.
No RLHF or DPO. Atem-1.7B has not undergone preference optimisation.
Roadmap
- Atem-14B: Single CoT-preserving pass on Qwen3-14B, r=128 (3.10% proportional capacity), with expanded GSM8K-format and camel-ai/chemistry additions to the corpus
Citation
@misc{atem_1b7_2026,
author = {Asad, Zain},
title = {Atem-1.7B: A 1.7B CoT-Preserving Reasoning Model via
Single-Pass SFT on Qwen3},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/EphAsad/Atem-1.7B}},
}
License
Released under the Apache 2.0 License, consistent with the base model Qwen/Qwen3-1.7B.
Built independently by Zain Asad — EphAsad
- Downloads last month
- 232

# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="EphAsad/Atem-1.7B", filename="", )