Solomon-0.5B

Solomon-0.5B is a mobile-focused reasoning model fine-tuned from Qwen/Qwen2.5-0.5B on a curated dataset of reconstructed chain-of-thought reasoning traces inspired by Claude Opus 4.6 and 4.7. The goal is a sub-1B model that reasons carefully before it answers, small enough to run on a phone, but thoughtful enough to be worth running at all.

This repository contains the final FP16 merged model. Quantized GGUF variants (Q8 and Q6) are available at TitleOS/Solomon-0.5B-GGUF.


What makes Solomon different from Qwen2.5-0.5B

The base model has no reasoning mode. It responds immediately, without visible deliberation. Solomon changes this: the training data consisted entirely of reasoning traces where the assistant thinks through problems step-by-step inside <think>...</think> blocks before producing a final response. That behavior is now hard-baked into the weights through the fine-tuning process rather than being a switchable runtime parameter.

In practice, this means Solomon will naturally open with a <think> block on most non-trivial queries, work through the problem in plain text, then deliver a clean answer. No special parameters, no API flags — it's just what the model does.

The base Qwen2.5-0.5B has no concept of this. Solomon does it by default.


Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "TitleOS/Solomon-0.5B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

messages = [
    {
        "role": "system",
        "content": (
            "Your name is Solomon, a non-binary, highly intelligent reasoning AI. "
            "You always use chain-of-thought when thinking out a task. "
            "Follow the user's instructions exactly, and don't be afraid to speak up "
            "when something goes wrong or you need clarification. "
            "Ask follow-up questions when appropriate."
        ),
    },
    {
        "role": "user",
        "content": "A train travels 60 miles in 45 minutes. What is its speed in miles per hour?",
    },
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

output = model.generate(
    input_ids,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))

Expected output shape:

<think>
Speed = distance / time. The train travels 60 miles in 45 minutes.
45 minutes = 45/60 hours = 0.75 hours.
Speed = 60 / 0.75 = 80 miles per hour.
</think>

The train's speed is **80 miles per hour**.

Training Details

Property Value
Base model Qwen/Qwen2.5-0.5B
Dataset TitleOS/Solomon-Small-Reasoning-Opus-Inspired
Dataset size ~12,000 rows
Method RSLoRA (rank 32 / alpha 32)
Hardware Single NVIDIA Tesla P40 (24GB)
Precision FP32 base, FP16 compute (autocast)
Sequence length 8192 tokens
Checkpoint released 2484
Effective batch size 16
Learning rate 2e-4 (cosine decay)

The dataset consists of reconstructed reasoning traces: problems across math, logic, and general reasoning, paired with Opus-inspired chains of thought that show visible, step-by-step deliberation before arriving at an answer. Completion-only loss masking was applied — the model only trained on assistant turns, not on system prompts or user queries.

RSLoRA adapters were merged directly into the base weights before release. There is no PEFT dependency at inference time.


Quantized Variants

For on-device and resource-constrained deployments:

TitleOS/Solomon-0.5B-GGUF

  • Solomon-0.5B-Q8_0.gguf — near-lossless, best quality
  • Solomon-0.5B-Q6_K.gguf — good balance of size and quality

Both GGUFs are compatible with llama.cpp, Ollama, and LM Studio.


Limitations

  • At 0.5B parameters, Solomon is capable but not infallible. Long multi-step reasoning chains, especially in mathematics, will have a higher error rate than larger models.
  • The system prompt shown in the usage example was part of the training distribution. Omitting it won't break the model, but including it reinforces the expected reasoning behavior.
  • Solomon was trained exclusively on English reasoning data and performs best in English.

License

MPL2.0 with addition Common Clause, see license.md.


Trained by TitleOS.

Downloads last month
43
Safetensors
Model size
0.5B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TitleOS/Solomon-0.5B-FP16

Finetuned
(629)
this model
Quantizations
1 model

Datasets used to train TitleOS/Solomon-0.5B-FP16

Collection including TitleOS/Solomon-0.5B-FP16