How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="carlosmm26/Atanor-4B",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

🜂 Atanor-4B

A fine-tune of Qwen3.5-4B specialized for agentic tool-use inside Hermes Agent.

There were 9B and 27B versions of Qwen fine-tuned to run as agents in Hermes (the Carnice models). That made me wonder: could a smaller model do the same job?

Atanor-4B is my answer — and my first fine-tune ever. It was trained entirely locally, on a single RTX 3090.

The name Atanor is the alchemist's furnace: the 4B is the lead that goes in, the agent is what comes out. 🜂


Results — agentic evaluation

Measured on a 60-task Hermes-native agent benchmark (real tool execution inside Hermes Agent, deterministic / temperature 0), base vs fine-tune:

Metric Qwen3.5-4B (base) Atanor-4B
Agent score 0.81 0.84
Picking the right tool 30% 60% ⬆️ doubled
Task success 67% 73%

The core agent skill — choosing the correct tool for a task — doubled (30% → 60%).


How it was made

Following the Carnice recipe, in two LoRA stages (BF16, on one RTX 3090):

  • Stage A — reasoning repair: Bespoke-Stratos + NuminaMath-CoT
  • Stage B — Hermes tool-use: the kai-os/carnice-glm5-hermes-traces traces (the full set, seq len 16384)

~33 hours of training, zero crashes.

Files in this repo

File What it is
atanor-4b-full-Q4_K_M.gguf Quantized (~2.6 GB) — run it directly in llama.cpp / Hermes / Ollama
atanor-4b-full-Q4_K_M-f16.gguf Full-precision GGUF (~8 GB) — for re-quantizing or lossless inference
*.safetensors (merged) Full merged model for transformers / further fine-tuning
adapter/ The LoRA adapter alone, to apply on the base model

Usage (llama.cpp / Hermes)

# llama.cpp server
llama-server -m atanor-4b-full-Q4_K_M.gguf --jinja -ngl 99 -c 32768 --alias atanor

# point a Hermes profile at it (provider base_url: http://localhost:8081/v1)
hermes chat --profile atanor -q "read data.csv and total the 'south' region using the terminal"

Thinking is on by default — it helps the model reason about which tool to use. (Pass chat_template_kwargs: {"enable_thinking": false} to disable.)


This is my first fine-tune, and version one. More to come. I learned a ton — and the best part is it was all done at home, on my own GPU.

Built with the Hermes Agent ecosystem. Base model © Qwen, Apache 2.0.

Downloads last month
70
Safetensors
Model size
5B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for carlosmm26/Atanor-4B

Finetuned
Qwen/Qwen3.5-4B
Adapter
(252)
this model
Adapters
2 models

Datasets used to train carlosmm26/Atanor-4B

Evaluation results