Instructions to use EphAsad/Atem-1.7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EphAsad/Atem-1.7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EphAsad/Atem-1.7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("EphAsad/Atem-1.7B")
model = AutoModelForCausalLM.from_pretrained("EphAsad/Atem-1.7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use EphAsad/Atem-1.7B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EphAsad/Atem-1.7B",
	filename="Atem-1.7B.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use EphAsad/Atem-1.7B with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf EphAsad/Atem-1.7B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf EphAsad/Atem-1.7B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf EphAsad/Atem-1.7B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf EphAsad/Atem-1.7B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf EphAsad/Atem-1.7B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf EphAsad/Atem-1.7B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf EphAsad/Atem-1.7B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf EphAsad/Atem-1.7B:Q4_K_M

Use Docker

docker model run hf.co/EphAsad/Atem-1.7B:Q4_K_M

LM Studio
Jan

vLLM

How to use EphAsad/Atem-1.7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EphAsad/Atem-1.7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-1.7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EphAsad/Atem-1.7B:Q4_K_M

SGLang

How to use EphAsad/Atem-1.7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EphAsad/Atem-1.7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-1.7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EphAsad/Atem-1.7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-1.7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use EphAsad/Atem-1.7B with Ollama:
```
ollama run hf.co/EphAsad/Atem-1.7B:Q4_K_M
```

Unsloth Studio

How to use EphAsad/Atem-1.7B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Atem-1.7B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Atem-1.7B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EphAsad/Atem-1.7B to start chatting

How to use EphAsad/Atem-1.7B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf EphAsad/Atem-1.7B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "EphAsad/Atem-1.7B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use EphAsad/Atem-1.7B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf EphAsad/Atem-1.7B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default EphAsad/Atem-1.7B:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use EphAsad/Atem-1.7B with Docker Model Runner:
```
docker model run hf.co/EphAsad/Atem-1.7B:Q4_K_M
```

Lemonade

How to use EphAsad/Atem-1.7B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull EphAsad/Atem-1.7B:Q4_K_M

Run and chat with the model

lemonade run user.Atem-1.7B-Q4_K_M

List all available models

lemonade list

Atem-1.7B

Ancient logic. Modern intelligence.

A 1.7B reasoning model trained via a single CoT-preserving SFT pass directly on Qwen3-1.7B, distilling multi-domain reasoning capability from frontier teacher models while keeping the base model's native thinking capability intact.

Overview

Atem-1.7B is a 1.7B parameter reasoning model built via a single supervised fine-tuning pass on raw Qwen3-1.7B, using the same CoT-preserving single-pass design as Atem-4B and Atem-8B. It is the most compute-efficient model in the Atem series, completing training in under 2.5 hours on an A100-SXM4 80GB while maintaining 2.95% proportional LoRA capacity — close to the series-wide 3% target.

This model includes GSM8K-format training examples (5K no-think records) to partially restore the #### answer convention that the reasoning corpus otherwise overwrites — an improvement over Atem-4B and Atem-8B, which did not include these.

Model Details

Property	Value
Base model	Qwen/Qwen3-1.7B
Training method	Single-pass CoT-Preserving LoRA SFT
LoRA config	r=48, alpha=96, dropout=0.05
Target modules	q, k, v, o, gate, up, down projections
Parameters	~1.77B
Trainable (LoRA) params	52,297,728 (2.95% of base)
Training records	62,301 (after token-length filtering)
Think / No-think split	85% / 15%
Epochs	2 (ceiling; early stopping patience=3, never triggered)
Effective batch size	64 (batch 16 × grad accum 4)
Learning rate	1e-4, cosine schedule, 5% warmup
Max sequence length	6,144 tokens
Precision	bfloat16 (full 16-bit LoRA, not QLoRA)
Hardware	NVIDIA A100-SXM4 80GB
Runtime	2h28m
License	Apache 2.0

Design Notes

Single combined pass. The same single CoT-preserving pass design used across Atem-4B and Atem-8B — no erase-then-rebuild pipeline. Reasoning capability is built directly on the base model's intact native foundation.

r=48 for proportional capacity. r=32 on a 1.7B model represents only 2.05% of the model's parameters — the same shrinking-fraction problem observed across the series as model size grows. r=48 recovers 2.95% proportional capacity, close to the series-wide ~3% target and significantly better than r=32 would have provided.

GSM8K format restoration. The standard Atem training corpus uses \boxed{} notation throughout. Atem-4B and Atem-8B both showed a systematic GSM8K strict-match regression as a result of this format shift. Atem-1.7B is the first in the series to include 5,000 GSM8K-format training examples (from openai/gsm8k) in the no-think pool, partially re-establishing the #### answer convention alongside \boxed{}.

Full 16-bit LoRA. At 1.7B the model weights occupy only ~3.4GB, leaving over 75GB of A100 headroom. Full 16-bit LoRA is used throughout — faster and marginally more accurate than QLoRA without any VRAM constraint.

Intended Use

Atem-1.7B is suited for reasoning tasks on resource-constrained hardware — edge devices, local deployment, and applications where a 4B+ model is impractical:

Multi-step mathematical reasoning
Code explanation, implementation, and debugging
Analytical reasoning across diverse domains
Commonsense reasoning and physical intuition
Logic and argument evaluation

For higher capability at the cost of resource requirements, Atem-4B and Atem-8B provide progressively stronger results on the same reasoning tasks.

Training Data

Atem-1.7B was trained on the same eight-source reasoning corpus as Atem-4B and Atem-8B, with the addition of 5,000 GSM8K-format records to partially restore the #### answer convention. All sources include explicit chain-of-thought reasoning traces; 85% of training records were formatted with full think traces and 15% as direct answers.

Dataset	Records	Source / Teacher
mitroitskii/OpenR1-Math-220k-formatted	~10,938	DeepSeek-R1 — Mathematics (correctness-filtered)
Jackrong/Claude-opus-4.6-TraceInversion-9000x	7,000	Claude Opus 4.6 — Trace Inversion
Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned (General-Math)	8,000	Kimi K2.5 — Mathematical Reasoning
Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned (General-Distillation)	8,000	Kimi K2.5 — General Reasoning
Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned (PHD-Science)	8,000	Kimi K2.5 — Scientific Reasoning
WithinUsAI/MiniMax_M2.7_Distilled_5k	5,000	MiniMax M2.7
FreedomIntelligence/medical-o1-reasoning-SFT	7,500	Medical reasoning (English config)
Modotte/CodeX-2M-Thinking	15,000	Mixed — Coding with CoT
trjxter/DeepSeek-V4-Pro-Reasoning-8000x	~8,014	DeepSeek-V4-Pro
nvidia/OpenCodeReasoning	15,000	Mixed — Competitive coding
openai/gsm8k (no-think)	5,000	GSM8K `#### answer` format restoration
Total (pre-filter pool)	96,017
Total (post-filter, trained on)	62,301

Non-English reasoning traces (primarily CJK) were filtered at the trace level using an ASCII-ratio threshold and retained as no-think records. The 34.3% filter rate is consistent with Atem-4B (32.7%) and Atem-8B (34.3%) at the same 6,144-token ceiling.

Training Configuration

# Key hyperparameters
lora_r             = 48
lora_alpha         = 96
lora_dropout       = 0.05
max_seq_length     = 6144
learning_rate      = 1e-4
lr_scheduler       = 'cosine'
warmup_ratio       = 0.05
batch_size         = 16
grad_accumulation  = 4           # effective batch size: 64
num_epochs         = 2           # ceiling — early stopping patience=3
eval_steps         = 150
early_stopping_patience   = 3
early_stopping_threshold  = 0.001
nothink_ratio      = 0.15
load_in_4bit       = False       # full 16-bit LoRA
dtype              = bfloat16

Loss Curve

Step	Train Loss	Val Loss
150	1.0706	1.0833
300	1.0385	1.0520
450	1.0566	1.0372
600	0.9990	1.0255
750	1.0082	1.0158
900	0.9887	1.0091
1050	0.9294	1.0051
1200	0.8906	1.0020
1350	0.9331	0.9993
1500	0.9780	0.9973
1650	0.9467	0.9963
1800	0.9341	0.9957
Final (1948)	0.9902 (avg)	0.9956

Train loss is noisier than in larger Atem models — characteristic of smaller models with a diverse multi-domain corpus. Validation loss improved monotonically across all 13 checkpoints without exception. Early stopping was configured but never triggered.

Evaluation

Benchmark Results

Evaluated against base Qwen3-1.7B (Qwen/Qwen3-1.7B) using lm-evaluation-harness. Both models were loaded in 4-bit for evaluation. Statistical significance (σ) is provided as context for interpreting each result — at 1.7B scale, several deltas that appear directionally positive are within sampling noise due to test set size.

Task	Base (Qwen3-1.7B)	Atem-1.7B	Delta	σ
ARC-Challenge (0-shot, acc_norm)	40.7%	42.2%	+1.5pp ✓	0.7σ
GSM8K strict (5-shot, exact_match)	62.0%	58.7%	−3.3pp ⚠	1.7σ
HellaSwag (0-shot, acc_norm)	59.4%	61.3%	+1.9pp ✓	2.8σ
MMLU (0-shot, acc)	55.4%	56.2%	+0.8pp ✓	1.3σ
Winogrande (0-shot, acc)	61.8%	61.1%	−0.7pp ⚠	0.4σ
PIQA (0-shot, acc)	71.4%	71.4%	+0.0pp —	0.0σ
OpenBookQA (0-shot, acc_norm)	36.0%	39.0%	+3.0pp ✓	1.0σ
BoolQ (0-shot, acc)	76.5%	76.0%	−0.5pp —	0.5σ

HellaSwag (+1.9pp, 2.8σ) is the only clearly statistically significant positive result. It uses normalised log-likelihood scoring over multiple-choice options — format-independent and not influenced by generation style. This is also the most consistent signal across the full Atem series (1.7B: +1.9pp, 4B: +2.9pp, 8B: +1.7pp), confirming genuine commonsense reasoning transfer from the CoT training corpus.

OpenBookQA (+3.0pp) is directionally strong but the test set is only 500 questions, giving 1.0σ — treat this as encouraging rather than conclusive.

Winogrande (−0.7pp, ⚠) despite the flag is 0.4σ and statistically indistinguishable from noise. Not a meaningful regression.

MMLU (+0.8pp, 1.3σ) is borderline. Consistent with the series pattern — neither model has a knowledge breadth advantage after CoT training.

Results at 1.7B are generally less pronounced than at 4B and 8B, as expected: smaller models with proportionally larger parameter changes per training step exhibit noisier benchmark behaviour, and the absolute capability headroom above random baselines is narrower.

GSM8K — Formatting Shift

The strict-match regression (−3.3pp) follows the same pattern established at 4B and 8B: the training corpus uses \boxed{} notation, systematically shifting away from the #### format that lm_eval's strict-match extraction expects. At 1.7B the base model scores 62.0% — above the threshold where formatting effects dominate over raw capability gains (the 0.6B base at 26.7% was below this threshold and actually improved on strict-match).

Atem-1.7B is the first model in the series to include GSM8K-format (#### answer) training examples. At 5,000 records out of 62,301 total (8%), this partially offsets the shift but does not eliminate it — larger proportions would be needed for full recovery. Based on the flexible-extraction recovery rate confirmed at 8B (68% of regression recovered), the estimated true capability gap is approximately −1.1pp rather than −3.3pp.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "EphAsad/Atem-1.7B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": "Explain why the harmonic mean is used for average speeds rather than the arithmetic mean."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=2000,
        temperature=0.6,
        top_p=0.95,
        top_k=20,
        do_sample=True,
        repetition_penalty=1.1,
    )

response = tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=True
)
print(response)

Unsloth (faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="EphAsad/Atem-1.7B",
    max_seq_length=6144,
    dtype=torch.bfloat16,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {
        "role": "user",
        "content": "What is the time complexity of merge sort and why?"
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=2000,
        temperature=0.6,
        top_p=0.95,
        top_k=20,
        do_sample=True,
    )

print(tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=True
))

Ollama

# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Atem-1.7B:Q4_K_M

# Higher quality
ollama run hf.co/EphAsad/Atem-1.7B:Q5_K_M

# Near-lossless
ollama run hf.co/EphAsad/Atem-1.7B:Q8_0

llama.cpp

llama-server -hf EphAsad/Atem-1.7B:Q4_K_M

Sampling Parameters

Use temperature=0.6, top_p=0.95, top_k=20 — Qwen3's published recommendation for thinking mode. Do not use greedy decoding with thinking mode enabled.

System Prompt

Atem-1.7B's identity is baked into the chat template and activates automatically without an explicit system message. For manual override:

You are Atem, a precise and analytical reasoning assistant. You approach
every problem methodically — identifying core concepts, reasoning step by
step, and arriving at well-supported conclusions. You show your thinking
clearly and are thorough, direct, and intellectually honest.

Available Files

File	Size	Description
`model.safetensors`	3.44 GB	Full bfloat16 merged weights (single shard)
`Atem-1.7b.Q4_K_M.gguf`	1.11 GB	4-bit quantised — recommended
`Atem-1.7b.Q5_K_M.gguf`	1.26 GB	5-bit quantised
`Atem-1.7b.Q8_0.gguf`	1.83 GB	8-bit quantised — near-lossless

Known Limitations

GSM8K formatting shift. As documented in the evaluation section, the training corpus uses \boxed{} for mathematical answers. Despite the inclusion of 5,000 GSM8K-format examples, the strict-match regression persists at −3.3pp. The estimated true capability gap under flexible extraction is approximately −1.1pp. Future runs with a higher proportion of GSM8K-format examples would reduce this further.

Statistical modesty at 1.7B. Most benchmark deltas at this scale are within sampling noise — HellaSwag is the exception (2.8σ). This is expected: 1.7B models have narrower performance headroom and proportionally larger variance per benchmark question. The reasoning improvements are real but harder to detect reliably at smaller scale.

6,144 token sequence ceiling. The longest reasoning traces (advanced mathematics, competitive programming) were dropped during formatting. The model has not been trained on very long chain-of-thought traces.

No RLHF or DPO. Atem-1.7B has not undergone preference optimisation.

Roadmap

Atem-14B: Single CoT-preserving pass on Qwen3-14B, r=128 (3.10% proportional capacity), with expanded GSM8K-format and camel-ai/chemistry additions to the corpus

Citation

@misc{atem_1b7_2026,
  author       = {Asad, Zain},
  title        = {Atem-1.7B: A 1.7B CoT-Preserving Reasoning Model via
                  Single-Pass SFT on Qwen3},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/EphAsad/Atem-1.7B}},
}

License

Released under the Apache 2.0 License, consistent with the base model Qwen/Qwen3-1.7B.

Built independently by Zain Asad — EphAsad

Downloads last month: 232

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for EphAsad/Atem-1.7B

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Adapter

(542)

this model

Adapters

2 models

EphAsad
/

Atem-1.7B

Atem-1.7B

Overview

Model Details

Design Notes

Intended Use

Training Data

Training Configuration

Loss Curve

Evaluation

Benchmark Results

GSM8K — Formatting Shift

Usage

Transformers

Unsloth (faster inference)

Ollama

llama.cpp

Sampling Parameters

System Prompt

Available Files

Known Limitations

Roadmap

Citation

License

Model tree for EphAsad/Atem-1.7B

Datasets used to train EphAsad/Atem-1.7B