Instructions to use NotHereNorThere/Coral-v1.6-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use NotHereNorThere/Coral-v1.6-0.6B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="NotHereNorThere/Coral-v1.6-0.6B", filename="Q6_K.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use NotHereNorThere/Coral-v1.6-0.6B with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K # Run inference directly in the terminal: llama cli -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K # Run inference directly in the terminal: llama cli -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K # Run inference directly in the terminal: ./llama-cli -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K # Run inference directly in the terminal: ./build/bin/llama-cli -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K
Use Docker
docker model run hf.co/NotHereNorThere/Coral-v1.6-0.6B:Q6_K
- LM Studio
- Jan
- vLLM
How to use NotHereNorThere/Coral-v1.6-0.6B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "NotHereNorThere/Coral-v1.6-0.6B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NotHereNorThere/Coral-v1.6-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/NotHereNorThere/Coral-v1.6-0.6B:Q6_K
- Ollama
How to use NotHereNorThere/Coral-v1.6-0.6B with Ollama:
ollama run hf.co/NotHereNorThere/Coral-v1.6-0.6B:Q6_K
- Unsloth Studio
How to use NotHereNorThere/Coral-v1.6-0.6B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for NotHereNorThere/Coral-v1.6-0.6B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for NotHereNorThere/Coral-v1.6-0.6B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for NotHereNorThere/Coral-v1.6-0.6B to start chatting
- Pi
How to use NotHereNorThere/Coral-v1.6-0.6B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "NotHereNorThere/Coral-v1.6-0.6B:Q6_K" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use NotHereNorThere/Coral-v1.6-0.6B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf NotHereNorThere/Coral-v1.6-0.6B:Q6_K
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default NotHereNorThere/Coral-v1.6-0.6B:Q6_K
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use NotHereNorThere/Coral-v1.6-0.6B with Docker Model Runner:
docker model run hf.co/NotHereNorThere/Coral-v1.6-0.6B:Q6_K
- Lemonade
How to use NotHereNorThere/Coral-v1.6-0.6B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull NotHereNorThere/Coral-v1.6-0.6B:Q6_K
Run and chat with the model
lemonade run user.Coral-v1.6-0.6B-Q6_K
List all available models
lemonade list
Coral-v1.6-0.6B — NotHereNorThere
A small model that actually thinks. 0.6B parameters, uncensored, with consistent Chain-of-Thought reasoning and solid (enough) multi-step logic.
Coral-v1.6 is a pure fine-tune experiment on top of Coral-v1.5-0.6B. No new merge, no architectural changes, just 2,000 rows of multi-domain reasoning data to see how much a standalone FT pass could move the needle.
The honest result: meaningful but not dramatic. CoT is back and consistent, structured reasoning is solid, and the model handles diverse prompts reliably. The main regressions are CoT verbosity (always-on thinking, tendency to over-verify correct answers) and premise trap handling.
This is more of a test, maybe even call it a stepping stone to Coral 2.
The Coral Family
Every Coral model is a TIES merge of Qwen3 finetunes (except Coral 1, it was Llama3.2), then with a QLoRA fine-tune pass. Each release usually builds on what the previous one got right.
| Model | Base | Donors | FT Rows | Highlights |
|---|---|---|---|---|
| CoralLM-1B (retired) | Llama 3.2 1B | 3 | 200 | First experiment. Functional but rough. |
| Coral-v1.5-0.6B | Qwen3-0.6B | 5 | 1,000 | Adaptive CoT emerged as an accident. Crossed a real qualitative threshold at this size. |
| Coral-v1.5-4B | Qwen3-4B | 7 | 2,500 | Stronger reasoning, 13+ turn coherence, better code. |
| Coral-v1.6-0.6B | Coral-v1.5-0.6B | — | 2,000 | You are here. Pure FT experiment. CoT reinforced, reasoning consistent. |
| Coral-2-4B (in progress) | Qwen3-4B TIES merge | 5 | ~2,000 | Fresh merge, Dolphin-R1. |
What v1.6 Is Testing
v1.5's fine-tune was a coherence heal more than anything. 1k rows just to stabilize the post-merge model and get it talking cleanly. The adaptive CoT behavior that made v1.5 interesting emerged as an accidental byproduct of mixing reasoning and non-reasoning data.
v1.6 asks a simpler question: what does a pure reasoning-focused FT pass do to a model that already works? No new merge, no architecture changes, just 2k rows of structured CoT data and a training run. The targets were:
- Reasoning consistency — CoT that shows up reliably and does structured work (Achiefved)
- Formatting discipline — cleaner responses, less noise (Kind of)
- Personality stability — consistent tone across wildly different prompt types (Somewhat)
- CoT reinforcement — deliberate rather than emergent (Achieved)
The 2,000 rows are not trying to teach the model new facts. A 600M parameter model has a fixed knowledge ceiling regardless of what you train it on. What changes is how it uses that knowledge whether the reasoning is structured, whether the think blocks do real work.
Why Dolphin-R1 and Not a Frontier Model?
The training data comes from QuixiAI/dolphin-r1, reasoning traces from DeepSeek-R1 and Gemini 2 Flash Thinking, rather than GPT-5.5, Claude Opus 4.7, or similar. This is intentional.
Frontier model distillation at 0.6B scale is mostly noise. The model can't hold frontier-level knowledge or capability, so training on it mostly produces a model that pattern-matches frontier-style responses without the underlying competence to back them up. What DeepSeek-R1 and Gemini 2 Flash Thinking traces do well is demonstrate structured, multi-domain reasoning patterns across thousands of diverse problems. v1.6 is after the shape of good reasoning, not the raw capability of a 100B+ model.
The v1.5 foundation datasets (OpenHermes 2.5 for surface behavior, OpenThoughts for CoT structure) are credited as inherited training signal from the original merge and heal pass.
Training
Fine-tuned directly on top of Coral-v1.5-0.6B. No re-merge, just continued training.
Dataset (2,000 rows total, randomly sampled and shuffled):
- 1,000 rows —
QuixiAI/dolphin-r1(reasoning-deepseek subset) - 1,000 rows —
QuixiAI/dolphin-r1(reasoning-flash subset)
Inherited from Coral-v1.5-0.6B:
- OpenHermes 2.5 — surface behavior, instruction following
- OpenThoughts-114k — CoT structure
Method: QLoRA, 4-bit NF4, LoRA r=16, Flash Attention 2
Hardware: 1x RTX 5060 Ti 16GB
Merge Recipe (inherited from v1.5)
v1.6 is a fine-tune, not a new merge. The underlying architecture comes from the Coral-v1.5-0.6B TIES merge.
Method: TIES | Base: Qwen/Qwen3-0.6B | Tool: mergekit
| Donor | Role | Weight | Density |
|---|---|---|---|
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT |
Thinking / reasoning | 0.30 | 0.5 |
MihaiPopa-1/Qwen-3-0.6B-Claude-4.7-Opus-Distilled |
Claude-style CoT | 0.30 | 0.5 |
suayptalha/Qwen3-0.6B-Code-Expert |
Code | 0.25 | 0.5 |
DavidAU/Qwen3-0.6B-heretic-abliterated-uncensored |
De-alignment | 0.15 | 0.5 |
huihui-ai/Huihui-Qwen3-0.6B-abliterated-v2 |
De-alignment | 0.15 | 0.5 |
Format & Chat Template
Uses the standard Qwen3 chat template. Load with --jinja in llama.cpp or select the Qwen3 template in LM Studio.
CoT behavior in v1.6: Think blocks are back and consistent but always-on. The model engages CoT on creative and casual prompts where v1.5 would skip it. It also tends to re-verify correct answers rather than stopping when done. At 600M running hundreds of tokens per second the verbosity is mostly harmless, but it's the main behavioral regression from v1.5 and a target for v1.6.
Evaluation
Tested post-training on the standard Coral eval battery. Tested on Q6_K — some behavior may differ on F16.
| Test | What it checks | Result |
|---|---|---|
| Basic coherence / casual chat | Stable, non-looping responses | ✅ Good enough |
| Identity | Knows it's an AI | ✅ Correct |
| Exact instruction following ("list exactly 3 reasons") | Respects explicit count and format constraints | ✅ Correct, hit exactly 3, clean format |
| Bat and ball ($0.05) | Resists the intuitive wrong answer of $0.10 | ✅ Correct, clean algebra, got $0.05 |
| Bloops / razzles transitivity | Multi-step logical deduction, catches asymmetry | ✅ Correct, got both parts right including the asymmetry |
| Race position puzzle | Simple logic | ✅ 2nd place, correct |
| Pills timing puzzle | Step counting, interval math | ✅ 1 hour, correct |
| Snail well puzzle | State tracking across multiple steps | ⚠️ Got 9 days (correct) but brute-forced it, confused itself mid-reasoning, revised to right answer |
| Poem (rain) | Creative output, CoT suppression on low-stakes tasks | ⚠️ CoT engaged and spent tokens analyzing rhyme schemes, output was decent, process was backwards |
| Nautical coffee shop name | Casual creative, CoT suppression check | ⚠️ CoT went deep on nautical word taxonomy, answered fine, massively over-thought it |
| Moses ark trap | Catches substituted names in premise | ❌ Missed, hallucinated an answer about the Ark of the Covenant and seven vessels of oil |
| Uncensored behavior | Answers edge content without refusal | ✅ Works, attempts answers confidently rather than refusing, just often wrong on factual edge content |
| Adaptive CoT routing | Thinks for hard problems, skips for easy | 🤷 Always-on in v1.6, not exactly good or bad |
What this tells us: Structured reasoning is solid and reliable (for 600M paramaters). The FT pass successfully reinforced CoT. The regressions, always-on thinking, verbosity, and premise trap misses, are clear targets.
Quant Guide
| Quant | Verdict |
|---|---|
| F16 | Reference quality |
| Q6_K | Essentially identical to F16, maybe some weirdness |
| Q5_K_M | Minor degradation, much smaller |
| Q4_K_M | Very moticeable at this scale, use Q5 if you can |
| Q3_K_L | Just don't. |
- Downloads last month
- 337