Instructions to use EphAsad/Atem-Pharaoh-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EphAsad/Atem-Pharaoh-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EphAsad/Atem-Pharaoh-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("EphAsad/Atem-Pharaoh-3B")
model = AutoModelForMultimodalLM.from_pretrained("EphAsad/Atem-Pharaoh-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use EphAsad/Atem-Pharaoh-3B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EphAsad/Atem-Pharaoh-3B",
	filename="Atem-Pharaoh-3B.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use EphAsad/Atem-Pharaoh-3B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EphAsad/Atem-Pharaoh-3B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EphAsad/Atem-Pharaoh-3B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EphAsad/Atem-Pharaoh-3B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EphAsad/Atem-Pharaoh-3B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf EphAsad/Atem-Pharaoh-3B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf EphAsad/Atem-Pharaoh-3B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf EphAsad/Atem-Pharaoh-3B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf EphAsad/Atem-Pharaoh-3B:Q4_K_M

Use Docker

docker model run hf.co/EphAsad/Atem-Pharaoh-3B:Q4_K_M

LM Studio
Jan

vLLM

How to use EphAsad/Atem-Pharaoh-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EphAsad/Atem-Pharaoh-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-Pharaoh-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EphAsad/Atem-Pharaoh-3B:Q4_K_M

SGLang

How to use EphAsad/Atem-Pharaoh-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EphAsad/Atem-Pharaoh-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-Pharaoh-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EphAsad/Atem-Pharaoh-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-Pharaoh-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use EphAsad/Atem-Pharaoh-3B with Ollama:
```
ollama run hf.co/EphAsad/Atem-Pharaoh-3B:Q4_K_M
```

Unsloth Studio

How to use EphAsad/Atem-Pharaoh-3B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Atem-Pharaoh-3B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Atem-Pharaoh-3B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EphAsad/Atem-Pharaoh-3B to start chatting

How to use EphAsad/Atem-Pharaoh-3B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EphAsad/Atem-Pharaoh-3B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "EphAsad/Atem-Pharaoh-3B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use EphAsad/Atem-Pharaoh-3B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EphAsad/Atem-Pharaoh-3B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default EphAsad/Atem-Pharaoh-3B:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use EphAsad/Atem-Pharaoh-3B with Docker Model Runner:
```
docker model run hf.co/EphAsad/Atem-Pharaoh-3B:Q4_K_M
```

Lemonade

How to use EphAsad/Atem-Pharaoh-3B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull EphAsad/Atem-Pharaoh-3B:Q4_K_M

Run and chat with the model

lemonade run user.Atem-Pharaoh-3B-Q4_K_M

List all available models

lemonade list

Atem-Pharaoh-3B

Ancient logic. Modern intelligence.

The 3B chain-of-thought model — explicit reasoning traces at scale.

Overview

Atem-Pharaoh-3B is the Stage 2 release of the 3B Atem series — a chain-of-thought fine-tune built on top of Atem-3B, trained to produce explicit <think>...</think> reasoning traces before arriving at a final answer. Where Atem-3B was trained to answer directly, Pharaoh is trained to think out loud.

Training used approximately 38,000 examples drawn from a pool of ~63,500 CoT-annotated records across mathematics, code, science, and general reasoning. A deliberate 75%/25% think/no-think split was applied — the model was trained on structured reasoning traces for the majority of examples and direct answers for the remainder, ensuring it can operate in both modes depending on how it is prompted.

Design note: Atem-Pharaoh-3B has a confirmed tendency toward verbose outputs and, on open-ended questions with many valid answers, occasional think trace runaways. Custom system prompts are strongly recommended to control verbosity, chain-of-thought depth, and output length. See the Prompting Guidance section below.

The Atem Series

1.5B Series

Model	Stage	Capability
Atem v1	Stage 1 — SFT	Fast, direct reasoning
Atem-Wisdom	Stage 2 — CoT	Explicit thinking traces
Atem-Pharaoh-1.5B (planned)	Stage 3 — DPO/IPO	Preference-aligned reasoning

3B Series

Model	Stage	Capability
Atem-3B	Stage 1 — SFT	Direct reasoning at 3B scale
Atem-Pharaoh-3B	Stage 2 — CoT	Explicit reasoning traces at 3B scale
Atem-Pharaoh-3B-DPO (planned)	Stage 3 — DPO/IPO	Preference-aligned reasoning

Model Details

Property	Value
Base model	EphAsad/Atem-3B
Training method	LoRA SFT — Stage 2 (CoT think traces)
LoRA config	r=32, alpha=64, dropout=0.05
Parameters	~3.09B
Trainable parameters	59,867,136 (1.90%)
Training records	38,157 (after token length filtering)
Think / no-think split	75% / 25%
Epochs	2
Final val loss	0.9494
Hardware	NVIDIA A100-SXM4-80GB
Max sequence length	4,096 tokens
Precision	bfloat16
License	Apache 2.0

Output Format

Atem-Pharaoh-3B produces responses in one of two formats depending on the prompt and training signal:

Think mode (75% of training):

<think>
{step-by-step reasoning trace}
</think>

{final answer}

Direct mode (25% of training):

{direct answer — no think tags}

The model defaults to think mode for most queries. To reliably suppress or encourage CoT, use a custom system prompt (see below).

Prompting Guidance

Atem-Pharaoh-3B responds to system prompt instruction. The default identity is baked into the chat template and produces think traces on most inputs. For deployment use cases where verbosity, output length, or CoT depth need controlling, the following prompt patterns are recommended.

Suppress CoT — direct answers only

You are Atem, a precise and analytical assistant. Respond directly and concisely.
Do not show internal reasoning. Answer the question and stop.

Calibrate length to question complexity

You are Atem, a precise and analytical assistant. Match your response length to
the complexity of the question — a single sentence for simple questions, full
reasoning for complex ones. Do not over-explain.

Full CoT — maximise reasoning depth

You are Atem, a precise and analytical assistant. Think through every problem
step by step before answering. Show your full reasoning inside <think> tags,
then give your final answer.

Cap think trace length

You are Atem, a precise and analytical assistant. When you reason through a
problem, keep your thinking concise — aim for no more than 150 words inside
<think> tags. Then give a clear, direct final answer.

Without a custom prompt, the model will use the default identity and tend toward longer, more structured outputs. On open-ended questions with many valid answers, this can result in extended reasoning traces. Prompting with an explicit length or format constraint reliably corrects this.

Training Data

Stage 2 training used approximately 38,000 examples after token-length filtering, drawn from a pool of ~63,500 CoT-annotated records. Chinese-language reasoning traces from Kimi K2.5 were filtered using an ASCII character ratio threshold before inclusion; non-English traces were downgraded to the no-think pool rather than discarded entirely. OpenR1-Math examples were filtered to correctness_llama == True only.

The think/no-think split was enforced programmatically: after all datasets were loaded into a think pool and a no-think pool, records were flipped from think→no-think until the no-think pool reached 25% of the total corpus.

Dataset	Count	Type
Modotte/CodeX-2M-Thinking	10,000	Code CoT
nvidia/OpenCodeReasoning	10,000	Code reasoning
Jackrong/Kimi-K2.5 (×3 configs)	15,000	General / Math / PhD reasoning
mitroitskii/OpenR1-Math-220k-formatted	7,000	Mathematics (correctness filter)
Jackrong/Claude-opus-4.6-TraceInversion-9000x	7,000	Inverted reasoning traces
trjxter/DeepSeek-V4-Pro-Reasoning-8000x	8,014	Reasoning distillation
WithinUsAI/MiniMax_M2.7_Distilled_5k	5,000	Mixed reasoning
FreedomIntelligence/medical-o1-reasoning-SFT	3,000	Medical reasoning

Loss curve:

Step	Train Loss	Val Loss
250	1.0215	0.9931
500	0.9615	0.9663
750	0.9516	0.9556
1000	0.9425	0.9502
1194 (final)	0.9897	0.9494

Training loss descent is steady across both epochs. The slight uptick at the final step is normal end-of-epoch behaviour on a cosine schedule.

Evaluation

A/B Comparison — Atem-Pharaoh-3B vs Qwen2.5-3B-Instruct

Evaluated on 30 questions calibrated to 3B model capability across coding, mathematics, analytical reasoning, and language tasks. Both models ran on identical prompts with no system prompt override.

Metric	Base (Qwen2.5-3B)	Atem-Pharaoh-3B
Think traces	0 / 30	30 / 30
Avg response length	152 words	427 words

Qualitative findings:

Coding tasks (is_even, count_vowels, list vs tuple, find_max, for vs while): Atem-Pharaoh-3B consistently correct with additional edge case handling and alternative approaches in the trace. Base model answers are correct but minimal.

Mathematical tasks: Both models correct. Pharaoh's traces show full working.

Analytical tasks (student score, shop visitors, correlation/causation, hiring/queuing): Pharaoh produces richer, more structured responses with clearer explanations. The queuing theory response (Q16) demonstrates genuine reasoning depth with well-constructed analogies.

Language tasks: Both models perform comparably. Pharaoh tends toward over-structuring simple tasks.

Known limitations observed in evaluation:

Think trace runaways: On open-ended questions where valid answers are unbounded, the think trace can degenerate into extended enumeration rather than converging on an answer. This was observed on Q27 (sentence ambiguity) in this evaluation and is consistent with behaviour observed in separate testing. The final answer typically recovers correctly, but the trace itself becomes incoherent. Custom system prompts with explicit trace length constraints are the recommended mitigation (see Prompting Guidance).

Verbosity mismatch: Response length does not scale to question complexity. Simple questions receive the same structural treatment as complex ones. A system prompt instructing the model to match length to complexity resolves this reliably.

Occasional tag artifacts: A small number of responses produced nested <think><think> opening tags. This is a minor formatting artifact with no effect on answer quality.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "EphAsad/Atem-Pharaoh-3B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": "Explain why a binary search is faster than a linear search."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=1024,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        do_sample=True,
    )

response = tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=True
)
print(response)

Unsloth (faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="EphAsad/Atem-Pharaoh-3B",
    max_seq_length=4096,
    dtype=torch.bfloat16,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {
        "role": "user",
        "content": "Write a Python function to check if a number is prime."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=1024,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))

Ollama

# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Atem-Pharaoh-3B:Q4_K_M

# Higher quality
ollama run hf.co/EphAsad/Atem-Pharaoh-3B:Q5_K_M

# Near-lossless
ollama run hf.co/EphAsad/Atem-Pharaoh-3B:Q8_0

llama.cpp

llama-server -hf EphAsad/Atem-Pharaoh-3B:Q4_K_M

Available Files

File	Size	Description
`model-00001-of-00002.safetensors` + `model-00002-of-00002.safetensors`	~6.2 GB	Full bfloat16 weights
`Atem-Pharaoh-3B.Q4_K_M.gguf`	~1.93 GB	4-bit — recommended
`Atem-Pharaoh-3B.Q5_K_M.gguf`	~2.22 GB	5-bit
`Atem-Pharaoh-3B.Q8_0.gguf`	~3.29 GB	8-bit — near-lossless

System Prompt

Atem-Pharaoh-3B's identity is baked into the chat template. For production use, override with a custom system prompt tailored to your use case (see Prompting Guidance above). The default identity:

You are Atem, a precise and analytical reasoning assistant. You approach
every problem methodically — identifying core concepts, reasoning step by
step, and arriving at well-supported conclusions. You show your thinking
clearly and are thorough, direct, and intellectually honest.

Roadmap

Stage	Status	Description
Stage 1 — SFT	✅ Complete	Atem-3B — direct reasoning
Stage 2 — CoT SFT	✅ Complete	Atem-Pharaoh-3B — this model
Stage 3 — DPO/IPO	🔄 Planned	Preference-aligned reasoning

Citation

@misc{atem_pharaoh_3b_2026,
  author       = {Asad, Zain},
  title        = {Atem-Pharaoh-3B: Chain-of-Thought Reasoning via Stage 2 CoT SFT},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/EphAsad/Atem-Pharaoh-3B}},
}

License

Released under the Apache 2.0 License, consistent with the base model lineage (Qwen2.5-3B-Instruct → Atem-3B → Atem-Pharaoh-3B).

Built independently by EphAsad

Downloads last month: 221

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for EphAsad/Atem-Pharaoh-3B

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Adapter

EphAsad/Atem-3B

Adapter

(1)

this model

Adapters

2 models

EphAsad
/

Atem-Pharaoh-3B

Atem-Pharaoh-3B

Overview

The Atem Series

Model Details

Output Format

Prompting Guidance

Suppress CoT — direct answers only

Calibrate length to question complexity

Full CoT — maximise reasoning depth

Cap think trace length

Training Data

Evaluation

A/B Comparison — Atem-Pharaoh-3B vs Qwen2.5-3B-Instruct

Usage

Transformers

Unsloth (faster inference)

Ollama

llama.cpp

Available Files

System Prompt

Roadmap

Citation

License

Model tree for EphAsad/Atem-Pharaoh-3B

Datasets used to train EphAsad/Atem-Pharaoh-3B