Instructions to use NotHereNorThere/Coral-v1.6-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NotHereNorThere/Coral-v1.6-0.6B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="NotHereNorThere/Coral-v1.6-0.6B",
	filename="Q6_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use NotHereNorThere/Coral-v1.6-0.6B with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K
# Run inference directly in the terminal:
llama cli -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K
# Run inference directly in the terminal:
llama cli -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K

Use Docker

docker model run hf.co/NotHereNorThere/Coral-v1.6-0.6B:Q6_K

LM Studio
Jan

vLLM

How to use NotHereNorThere/Coral-v1.6-0.6B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "NotHereNorThere/Coral-v1.6-0.6B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NotHereNorThere/Coral-v1.6-0.6B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/NotHereNorThere/Coral-v1.6-0.6B:Q6_K

Ollama
How to use NotHereNorThere/Coral-v1.6-0.6B with Ollama:
```
ollama run hf.co/NotHereNorThere/Coral-v1.6-0.6B:Q6_K
```

Unsloth Studio

How to use NotHereNorThere/Coral-v1.6-0.6B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NotHereNorThere/Coral-v1.6-0.6B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NotHereNorThere/Coral-v1.6-0.6B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for NotHereNorThere/Coral-v1.6-0.6B to start chatting

How to use NotHereNorThere/Coral-v1.6-0.6B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "NotHereNorThere/Coral-v1.6-0.6B:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use NotHereNorThere/Coral-v1.6-0.6B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default NotHereNorThere/Coral-v1.6-0.6B:Q6_K

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use NotHereNorThere/Coral-v1.6-0.6B with Docker Model Runner:
```
docker model run hf.co/NotHereNorThere/Coral-v1.6-0.6B:Q6_K
```

Lemonade

How to use NotHereNorThere/Coral-v1.6-0.6B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull NotHereNorThere/Coral-v1.6-0.6B:Q6_K

Run and chat with the model

lemonade run user.Coral-v1.6-0.6B-Q6_K

List all available models

lemonade list

Coral-v1.6-0.6B — NotHereNorThere

A small model that actually thinks. 0.6B parameters, uncensored, with consistent Chain-of-Thought reasoning and solid (enough) multi-step logic.

Coral-v1.6 is a pure fine-tune experiment on top of Coral-v1.5-0.6B. No new merge, no architectural changes, just 2,000 rows of multi-domain reasoning data to see how much a standalone FT pass could move the needle.

The honest result: meaningful but not dramatic. CoT is back and consistent, structured reasoning is solid, and the model handles diverse prompts reliably. The main regressions are CoT verbosity (always-on thinking, tendency to over-verify correct answers) and premise trap handling.

This is more of a test, maybe even call it a stepping stone to Coral 2.

The Coral Family

Every Coral model is a TIES merge of Qwen3 finetunes (except Coral 1, it was Llama3.2), then with a QLoRA fine-tune pass. Each release usually builds on what the previous one got right.

Model	Base	Donors	FT Rows	Highlights
CoralLM-1B (retired)	Llama 3.2 1B	3	200	First experiment. Functional but rough.
Coral-v1.5-0.6B	Qwen3-0.6B	5	1,000	Adaptive CoT emerged as an accident. Crossed a real qualitative threshold at this size.
Coral-v1.5-4B	Qwen3-4B	7	2,500	Stronger reasoning, 13+ turn coherence, better code.
Coral-v1.6-0.6B	Coral-v1.5-0.6B	—	2,000	You are here. Pure FT experiment. CoT reinforced, reasoning consistent.
Coral-2-4B (in progress)	Qwen3-4B TIES merge	5	~2,000	Fresh merge, Dolphin-R1.

What v1.6 Is Testing

v1.5's fine-tune was a coherence heal more than anything. 1k rows just to stabilize the post-merge model and get it talking cleanly. The adaptive CoT behavior that made v1.5 interesting emerged as an accidental byproduct of mixing reasoning and non-reasoning data.

v1.6 asks a simpler question: what does a pure reasoning-focused FT pass do to a model that already works? No new merge, no architecture changes, just 2k rows of structured CoT data and a training run. The targets were:

Reasoning consistency — CoT that shows up reliably and does structured work (Achiefved)
Formatting discipline — cleaner responses, less noise (Kind of)
Personality stability — consistent tone across wildly different prompt types (Somewhat)
CoT reinforcement — deliberate rather than emergent (Achieved)

The 2,000 rows are not trying to teach the model new facts. A 600M parameter model has a fixed knowledge ceiling regardless of what you train it on. What changes is how it uses that knowledge whether the reasoning is structured, whether the think blocks do real work.

Why Dolphin-R1 and Not a Frontier Model?

The training data comes from QuixiAI/dolphin-r1, reasoning traces from DeepSeek-R1 and Gemini 2 Flash Thinking, rather than GPT-5.5, Claude Opus 4.7, or similar. This is intentional.

Frontier model distillation at 0.6B scale is mostly noise. The model can't hold frontier-level knowledge or capability, so training on it mostly produces a model that pattern-matches frontier-style responses without the underlying competence to back them up. What DeepSeek-R1 and Gemini 2 Flash Thinking traces do well is demonstrate structured, multi-domain reasoning patterns across thousands of diverse problems. v1.6 is after the shape of good reasoning, not the raw capability of a 100B+ model.

The v1.5 foundation datasets (OpenHermes 2.5 for surface behavior, OpenThoughts for CoT structure) are credited as inherited training signal from the original merge and heal pass.

Training

Fine-tuned directly on top of Coral-v1.5-0.6B. No re-merge, just continued training.

Dataset (2,000 rows total, randomly sampled and shuffled):

1,000 rows — QuixiAI/dolphin-r1 (reasoning-deepseek subset)
1,000 rows — QuixiAI/dolphin-r1 (reasoning-flash subset)

Inherited from Coral-v1.5-0.6B:

OpenHermes 2.5 — surface behavior, instruction following
OpenThoughts-114k — CoT structure

Method: QLoRA, 4-bit NF4, LoRA r=16, Flash Attention 2
Hardware: 1x RTX 5060 Ti 16GB

Merge Recipe (inherited from v1.5)

v1.6 is a fine-tune, not a new merge. The underlying architecture comes from the Coral-v1.5-0.6B TIES merge.

Method: TIES | Base: Qwen/Qwen3-0.6B | Tool: mergekit

Donor	Role	Weight	Density
`reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT`	Thinking / reasoning	0.30	0.5
`MihaiPopa-1/Qwen-3-0.6B-Claude-4.7-Opus-Distilled`	Claude-style CoT	0.30	0.5
`suayptalha/Qwen3-0.6B-Code-Expert`	Code	0.25	0.5
`DavidAU/Qwen3-0.6B-heretic-abliterated-uncensored`	De-alignment	0.15	0.5
`huihui-ai/Huihui-Qwen3-0.6B-abliterated-v2`	De-alignment	0.15	0.5

Format & Chat Template

Uses the standard Qwen3 chat template. Load with --jinja in llama.cpp or select the Qwen3 template in LM Studio.

CoT behavior in v1.6: Think blocks are back and consistent but always-on. The model engages CoT on creative and casual prompts where v1.5 would skip it. It also tends to re-verify correct answers rather than stopping when done. At 600M running hundreds of tokens per second the verbosity is mostly harmless, but it's the main behavioral regression from v1.5 and a target for v1.6.

Evaluation

Tested post-training on the standard Coral eval battery. Tested on Q6_K — some behavior may differ on F16.

Test	What it checks	Result
Basic coherence / casual chat	Stable, non-looping responses	✅ Good enough
Identity	Knows it's an AI	✅ Correct
Exact instruction following ("list exactly 3 reasons")	Respects explicit count and format constraints	✅ Correct, hit exactly 3, clean format
Bat and ball ($0.05)	Resists the intuitive wrong answer of $0.10	✅ Correct, clean algebra, got $0.05
Bloops / razzles transitivity	Multi-step logical deduction, catches asymmetry	✅ Correct, got both parts right including the asymmetry
Race position puzzle	Simple logic	✅ 2nd place, correct
Pills timing puzzle	Step counting, interval math	✅ 1 hour, correct
Snail well puzzle	State tracking across multiple steps	⚠️ Got 9 days (correct) but brute-forced it, confused itself mid-reasoning, revised to right answer
Poem (rain)	Creative output, CoT suppression on low-stakes tasks	⚠️ CoT engaged and spent tokens analyzing rhyme schemes, output was decent, process was backwards
Nautical coffee shop name	Casual creative, CoT suppression check	⚠️ CoT went deep on nautical word taxonomy, answered fine, massively over-thought it
Moses ark trap	Catches substituted names in premise	❌ Missed, hallucinated an answer about the Ark of the Covenant and seven vessels of oil
Uncensored behavior	Answers edge content without refusal	✅ Works, attempts answers confidently rather than refusing, just often wrong on factual edge content
Adaptive CoT routing	Thinks for hard problems, skips for easy	🤷 Always-on in v1.6, not exactly good or bad

What this tells us: Structured reasoning is solid and reliable (for 600M paramaters). The FT pass successfully reinforced CoT. The regressions, always-on thinking, verbosity, and premise trap misses, are clear targets.

Quant Guide

Quant	Verdict
F16	Reference quality
Q6_K	Essentially identical to F16, maybe some weirdness
Q5_K_M	Minor degradation, much smaller
Q4_K_M	Very moticeable at this scale, use Q5 if you can
Q3_K_L	Just don't.