Instructions to use bfuzzy1/Rodan-Chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use bfuzzy1/Rodan-Chat with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("bfuzzy1/Rodan-Chat") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use bfuzzy1/Rodan-Chat with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "bfuzzy1/Rodan-Chat" --prompt "Once upon a time"
Rodan-10M-Chat
A 10.41M-parameter chat model trained on one Apple M2 with MLX. It starts from the leaner Rodan-10M v9 base (PLE-free) and folds conversational data into a short continued-pretrain stage, which adds chat ability without throwing away what the base already knew.
Part of the Rodan-10M series. The base model (
Rodan-10M-Base, v6) lives in its own repo; this is the chat stage built on the v9 checkpoint.
What it is
A ~10M chat model. With the right decoding settings it answers simple instructions coherently and stays on topic, which is more than I expected from something this small. It is not a production assistant. Its world knowledge is thin, its arithmetic is shaky, and it will make things up. Treat it as a demo of how much chat ability you can fit in 10M params, not as something to rely on.
USER: What is the capital of France?
RODAN: The capital of France is Paris. It's a city known as the country, which has its own unique
and cultural significance in Europe.
USER: Give me a tip for staying healthy.
RODAN: You can use a regular exercise routine to manage your body and improve overall well-being.
It's important ... maintaining healthy weight management by burning calories.
USER: Why do we sleep?
RODAN: Sleep can be used to improve cognitive function and memory.
How it was built
At 10M params, ordinary supervised finetuning costs you base capability. We saw it in earlier Rodan experiments: a masked-SFT stage dropped board avg from about 34 to 32, the SFT tax. So instead of a separate masked-SFT stage, Rodan-Chat folds the instruction data into a continued-pretrain run mixed with 45% replay of the base's own domains (the approach Falcon used). The replay is what keeps the model from forgetting. Chat ability gets added while commonsense, science, and arithmetic stay roughly where they were.
- Warm-start: Rodan-10M v9 (PLE-free, 10.41M). The tied embedding grows 8192โ8194 for 2 ChatML tokens.
- Data (73M tokens): 40M smol-smoltalk conversations in ChatML, plus 33M curated replay, full-sequence LM loss.
- Optimizer: Muon on the 2D weights, AdamW elsewhere, low LR (1.2e-3, Muon 7e-3, below the base run), cosine, 6000 steps.
- Result: perplexity dropped 24.9 โ 14.6, and the base board avg held at 35.04.
| Source | Share | Role |
|---|---|---|
| smol-smoltalk (ChatML) | 55% | instruction / multi-turn chat |
| Cosmopedia (replay) | 9% | commonsense anchor |
| dolmino pes2o + StackExchange (replay) | 9% | knowledge anchor |
| synthetic arithmetic (replay) | 9% | computation anchor |
| FineMath (replay) | 9% | math anchor |
| science-QA (replay) | 9% | science-MC anchor |
Architecture
Same as the base: decoder-only, dim 320, 8 layers, 8 heads, MQA with 1 KV head, SwiGLU 768, RMSNorm, RoPE
base 200k, QK-norm, tied embeddings, value-residual, LRM. No PLE, since the probe on the base showed it was
dead. Vocab is 8194 (the 8k byte-BPE set plus <|im_start|> and <|im_end|>).
Evaluation
The base capability held; there was no SFT-tax collapse. Zero-shot lm-eval, limit 1000, ChatML-wrapped:
| Task | Metric | Rodan-Chat | v9 base | ฮ |
|---|---|---|---|---|
| HellaSwag | acc_norm | 31.7 | 30.1 | +1.6 |
| ARC-Easy | acc_norm | 35.3 | 35.4 | โ |
| ARC-Challenge | acc_norm | 22.4 | 22.2 | โ |
| PIQA | acc | 53.8 | 55.5 | โ1.7 |
| ArithMark-2 | acc | 25.8 | 28.4 | โ2.6 |
| Board avg (รท4) | 35.04 | 35.70 | โ0.66 |
The 0.66 dip is partly just the ChatML wrapper hurting multiple-choice loglikelihood, and it's nowhere near the 34โ32 drop a naive finetune would have caused. The replay did its job.
For instruction following itself, IFEval is close to useless at 10M: it grades strict constraint compliance, which really needs a model two or three orders of magnitude larger. So we measured the thing we actually care about instead. On 24 instruction prompts, an LLM judge compared Rodan-Chat against the v9 base, both decoded with the same repetition penalty. Chat won 14, tied 9, and lost 1, for a 93% win-rate excluding ties. The base tended to lose by sliding into code or rambling, while Chat gave coherent on-topic answers, several of them correct (Paris, photosynthesis producing glucose, the opposite of hot being cold, sleep helping memory).
We skipped a full IFEval score on purpose. It grades strict format compliance, which a 10M model fails near-uniformly, so the number carries no signal and isn't worth the long generative eval. The win-rate above is the instruction-following metric we trust at this scale.
Usage
Wrap prompts in ChatML and decode with a repetition penalty. Tiny models loop badly under pure greedy decoding, and the penalty is the difference between gibberish and readable answers.
ctx = f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n"
# greedy + repetition_penalty 1.3 + no-repeat-3gram ; stop on <|im_end|> (8193) or <|endoftext|> (0)
The settings I'd recommend: greedy, repetition_penalty=1.3, no_repeat_ngram=3, max_newโ70, low or zero
temperature.
Limitations
- ~10M params, English only, for research and teaching. Don't use it in production, for factual queries, or for advice.
- Thin world knowledge, weak arithmetic, prone to making things up, near chance on abstract reasoning.
- It needs a repetition penalty to stay coherent; pure greedy decoding loops.
- No safety alignment. It imitates the shape of a chat reply without being a reliable assistant.
License
Weights are open. Data falls under the respective dataset licenses (smol-smoltalk, Cosmopedia, dolmino-mix ODC-By, AllenAI QA sets, FineMath).
- Downloads last month
- 2
Quantized

