Solomon-0.5B

Solomon-0.5B is a mobile-focused reasoning model fine-tuned from Qwen/Qwen2.5-0.5B on a curated dataset of reconstructed chain-of-thought reasoning traces inspired by Claude Opus 4.6 and 4.7. The goal is a sub-1B model that reasons carefully before it answers, small enough to run on a phone, but thoughtful enough to be worth running at all.

This repository contains the final FP16 merged model. Quantized GGUF variants (Q8 and Q6) are available at TitleOS/Solomon-0.5B-GGUF.

What makes Solomon different from Qwen2.5-0.5B

The base model has no reasoning mode. It responds immediately, without visible deliberation. Solomon changes this: the training data consisted entirely of reasoning traces where the assistant thinks through problems step-by-step inside <think>...</think> blocks before producing a final response. That behavior is now hard-baked into the weights through the fine-tuning process rather than being a switchable runtime parameter.

In practice, this means Solomon will naturally open with a <think> block on most non-trivial queries, work through the problem in plain text, then deliver a clean answer. No special parameters, no API flags — it's just what the model does.

The base Qwen2.5-0.5B has no concept of this. Solomon does it by default.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "TitleOS/Solomon-0.5B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

messages = [
    {
        "role": "system",
        "content": (
            "Your name is Solomon, a non-binary, highly intelligent reasoning AI. "
            "You always use chain-of-thought when thinking out a task. "
            "Follow the user's instructions exactly, and don't be afraid to speak up "
            "when something goes wrong or you need clarification. "
            "Ask follow-up questions when appropriate."
        ),
    },
    {
        "role": "user",
        "content": "A train travels 60 miles in 45 minutes. What is its speed in miles per hour?",
    },
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

output = model.generate(
    input_ids,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))

Expected output shape:

<think>
Speed = distance / time. The train travels 60 miles in 45 minutes.
45 minutes = 45/60 hours = 0.75 hours.
Speed = 60 / 0.75 = 80 miles per hour.
</think>

The train's speed is **80 miles per hour**.

Training Details

Property	Value
Base model	Qwen/Qwen2.5-0.5B
Dataset	TitleOS/Solomon-Small-Reasoning-Opus-Inspired
Dataset size	~12,000 rows
Method	RSLoRA (rank 32 / alpha 32)
Hardware	Single NVIDIA Tesla P40 (24GB)
Precision	FP32 base, FP16 compute (autocast)
Sequence length	8192 tokens
Checkpoint released	2484
Effective batch size	16
Learning rate	2e-4 (cosine decay)

The dataset consists of reconstructed reasoning traces: problems across math, logic, and general reasoning, paired with Opus-inspired chains of thought that show visible, step-by-step deliberation before arriving at an answer. Completion-only loss masking was applied — the model only trained on assistant turns, not on system prompts or user queries.

RSLoRA adapters were merged directly into the base weights before release. There is no PEFT dependency at inference time.

Quantized Variants

For on-device and resource-constrained deployments:

TitleOS/Solomon-0.5B-GGUF

Solomon-0.5B-Q8_0.gguf — near-lossless, best quality
Solomon-0.5B-Q6_K.gguf — good balance of size and quality

Both GGUFs are compatible with llama.cpp, Ollama, and LM Studio.

Limitations

At 0.5B parameters, Solomon is capable but not infallible. Long multi-step reasoning chains, especially in mathematics, will have a higher error rate than larger models.
The system prompt shown in the usage example was part of the training distribution. Omitting it won't break the model, but including it reinforces the expected reasoning behavior.
Solomon was trained exclusively on English reasoning data and performs best in English.