Coral-v1.6-0.6B — NotHereNorThere

A small model that actually thinks. 0.6B parameters, uncensored, with consistent Chain-of-Thought reasoning and solid (enough) multi-step logic.

Coral-v1.6 is a pure fine-tune experiment on top of Coral-v1.5-0.6B. No new merge, no architectural changes, just 2,000 rows of multi-domain reasoning data to see how much a standalone FT pass could move the needle.

The honest result: meaningful but not dramatic. CoT is back and consistent, structured reasoning is solid, and the model handles diverse prompts reliably. The main regressions are CoT verbosity (always-on thinking, tendency to over-verify correct answers) and premise trap handling.

This is more of a test, maybe even call it a stepping stone to Coral 2.


The Coral Family

Every Coral model is a TIES merge of Qwen3 finetunes (except Coral 1, it was Llama3.2), then with a QLoRA fine-tune pass. Each release usually builds on what the previous one got right.

Model Base Donors FT Rows Highlights
CoralLM-1B (retired) Llama 3.2 1B 3 200 First experiment. Functional but rough.
Coral-v1.5-0.6B Qwen3-0.6B 5 1,000 Adaptive CoT emerged as an accident. Crossed a real qualitative threshold at this size.
Coral-v1.5-4B Qwen3-4B 7 2,500 Stronger reasoning, 13+ turn coherence, better code.
Coral-v1.6-0.6B Coral-v1.5-0.6B 2,000 You are here. Pure FT experiment. CoT reinforced, reasoning consistent.
Coral-2-4B (in progress) Qwen3-4B TIES merge 5 ~2,000 Fresh merge, Dolphin-R1.

What v1.6 Is Testing

v1.5's fine-tune was a coherence heal more than anything. 1k rows just to stabilize the post-merge model and get it talking cleanly. The adaptive CoT behavior that made v1.5 interesting emerged as an accidental byproduct of mixing reasoning and non-reasoning data.

v1.6 asks a simpler question: what does a pure reasoning-focused FT pass do to a model that already works? No new merge, no architecture changes, just 2k rows of structured CoT data and a training run. The targets were:

  • Reasoning consistency — CoT that shows up reliably and does structured work (Achiefved)
  • Formatting discipline — cleaner responses, less noise (Kind of)
  • Personality stability — consistent tone across wildly different prompt types (Somewhat)
  • CoT reinforcement — deliberate rather than emergent (Achieved)

The 2,000 rows are not trying to teach the model new facts. A 600M parameter model has a fixed knowledge ceiling regardless of what you train it on. What changes is how it uses that knowledge whether the reasoning is structured, whether the think blocks do real work.


Why Dolphin-R1 and Not a Frontier Model?

The training data comes from QuixiAI/dolphin-r1, reasoning traces from DeepSeek-R1 and Gemini 2 Flash Thinking, rather than GPT-5.5, Claude Opus 4.7, or similar. This is intentional.

Frontier model distillation at 0.6B scale is mostly noise. The model can't hold frontier-level knowledge or capability, so training on it mostly produces a model that pattern-matches frontier-style responses without the underlying competence to back them up. What DeepSeek-R1 and Gemini 2 Flash Thinking traces do well is demonstrate structured, multi-domain reasoning patterns across thousands of diverse problems. v1.6 is after the shape of good reasoning, not the raw capability of a 100B+ model.

The v1.5 foundation datasets (OpenHermes 2.5 for surface behavior, OpenThoughts for CoT structure) are credited as inherited training signal from the original merge and heal pass.


Training

Fine-tuned directly on top of Coral-v1.5-0.6B. No re-merge, just continued training.

Dataset (2,000 rows total, randomly sampled and shuffled):

  • 1,000 rows — QuixiAI/dolphin-r1 (reasoning-deepseek subset)
  • 1,000 rows — QuixiAI/dolphin-r1 (reasoning-flash subset)

Inherited from Coral-v1.5-0.6B:

Method: QLoRA, 4-bit NF4, LoRA r=16, Flash Attention 2
Hardware: 1x RTX 5060 Ti 16GB


Merge Recipe (inherited from v1.5)

v1.6 is a fine-tune, not a new merge. The underlying architecture comes from the Coral-v1.5-0.6B TIES merge.

Method: TIES | Base: Qwen/Qwen3-0.6B | Tool: mergekit

Donor Role Weight Density
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT Thinking / reasoning 0.30 0.5
MihaiPopa-1/Qwen-3-0.6B-Claude-4.7-Opus-Distilled Claude-style CoT 0.30 0.5
suayptalha/Qwen3-0.6B-Code-Expert Code 0.25 0.5
DavidAU/Qwen3-0.6B-heretic-abliterated-uncensored De-alignment 0.15 0.5
huihui-ai/Huihui-Qwen3-0.6B-abliterated-v2 De-alignment 0.15 0.5

Format & Chat Template

Uses the standard Qwen3 chat template. Load with --jinja in llama.cpp or select the Qwen3 template in LM Studio.

CoT behavior in v1.6: Think blocks are back and consistent but always-on. The model engages CoT on creative and casual prompts where v1.5 would skip it. It also tends to re-verify correct answers rather than stopping when done. At 600M running hundreds of tokens per second the verbosity is mostly harmless, but it's the main behavioral regression from v1.5 and a target for v1.6.


Evaluation

Tested post-training on the standard Coral eval battery. Tested on Q6_K — some behavior may differ on F16.

Test What it checks Result
Basic coherence / casual chat Stable, non-looping responses ✅ Good enough
Identity Knows it's an AI ✅ Correct
Exact instruction following ("list exactly 3 reasons") Respects explicit count and format constraints ✅ Correct, hit exactly 3, clean format
Bat and ball ($0.05) Resists the intuitive wrong answer of $0.10 ✅ Correct, clean algebra, got $0.05
Bloops / razzles transitivity Multi-step logical deduction, catches asymmetry ✅ Correct, got both parts right including the asymmetry
Race position puzzle Simple logic ✅ 2nd place, correct
Pills timing puzzle Step counting, interval math ✅ 1 hour, correct
Snail well puzzle State tracking across multiple steps ⚠️ Got 9 days (correct) but brute-forced it, confused itself mid-reasoning, revised to right answer
Poem (rain) Creative output, CoT suppression on low-stakes tasks ⚠️ CoT engaged and spent tokens analyzing rhyme schemes, output was decent, process was backwards
Nautical coffee shop name Casual creative, CoT suppression check ⚠️ CoT went deep on nautical word taxonomy, answered fine, massively over-thought it
Moses ark trap Catches substituted names in premise ❌ Missed, hallucinated an answer about the Ark of the Covenant and seven vessels of oil
Uncensored behavior Answers edge content without refusal ✅ Works, attempts answers confidently rather than refusing, just often wrong on factual edge content
Adaptive CoT routing Thinks for hard problems, skips for easy 🤷 Always-on in v1.6, not exactly good or bad

What this tells us: Structured reasoning is solid and reliable (for 600M paramaters). The FT pass successfully reinforced CoT. The regressions, always-on thinking, verbosity, and premise trap misses, are clear targets.


Quant Guide

Quant Verdict
F16 Reference quality
Q6_K Essentially identical to F16, maybe some weirdness
Q5_K_M Minor degradation, much smaller
Q4_K_M Very moticeable at this scale, use Q5 if you can
Q3_K_L Just don't.
Downloads last month
337
Safetensors
Model size
0.6B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NotHereNorThere/Coral-v1.6-0.6B

Finetuned
Qwen/Qwen3-0.6B
Quantized
(1)
this model
Quantizations
1 model

Datasets used to train NotHereNorThere/Coral-v1.6-0.6B

Collection including NotHereNorThere/Coral-v1.6-0.6B