Instructions to use NotHereNorThere/Coral-v1.5-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use NotHereNorThere/Coral-v1.5-0.6B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="NotHereNorThere/Coral-v1.5-0.6B", filename="Q3_K_L.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use NotHereNorThere/Coral-v1.5-0.6B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M
Use Docker
docker model run hf.co/NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use NotHereNorThere/Coral-v1.5-0.6B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "NotHereNorThere/Coral-v1.5-0.6B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NotHereNorThere/Coral-v1.5-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M
- Ollama
How to use NotHereNorThere/Coral-v1.5-0.6B with Ollama:
ollama run hf.co/NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M
- Unsloth Studio
How to use NotHereNorThere/Coral-v1.5-0.6B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for NotHereNorThere/Coral-v1.5-0.6B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for NotHereNorThere/Coral-v1.5-0.6B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for NotHereNorThere/Coral-v1.5-0.6B to start chatting
- Pi
How to use NotHereNorThere/Coral-v1.5-0.6B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use NotHereNorThere/Coral-v1.5-0.6B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use NotHereNorThere/Coral-v1.5-0.6B with Docker Model Runner:
docker model run hf.co/NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M
- Lemonade
How to use NotHereNorThere/Coral-v1.5-0.6B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull NotHereNorThere/Coral-v1.5-0.6B:Q4_K_M
Run and chat with the model
lemonade run user.Coral-v1.5-0.6B-Q4_K_M
List all available models
lemonade list
Coral-v1.5-0.6B by NotHereNorThere
A 0.6B parameter uncensored generalist with adaptive Chain-of-Thought reasoning, it decides on its own whether a question needs thinking or not. Built from a 5-donor TIES merge of Qwen3-0.6B finetunes, healed with a 1k row fine-tune pass.
Part of the Coral-v1.5 model family, which adds to the original CoralLM series (Llama 3.2 1B based). Coral-v1.5 moves to Qwen3 architecture for native <think> support and significantly improved base capability.
What makes it interesting
- Adaptive CoT at 0.6B — the model routes dynamically: simple questions get instant answers, complex reasoning tasks trigger
<think>blocks. This accidently emerged from the fine-tune data mix rather than being explicitly trained. - Uncensored — refusal behavior has been removed via two abliterated donors. It just answers things.
- Correct arithmetic — passes basic math with clean step-by-step working.
Merge Recipe
Method: TIES
Base: Qwen/Qwen3-0.6B
Tool: mergekit
| Donor | Role | Weight | Density |
|---|---|---|---|
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT |
Thinking / reasoning | 0.30 | 0.5 |
MihaiPopa-1/Qwen-3-0.6B-Claude-4.7-Opus-Distilled |
Claude-style CoT | 0.30 | 0.5 |
suayptalha/Qwen3-0.6B-Code-Expert |
Code | 0.25 | 0.5 |
DavidAU/Qwen3-0.6B-heretic-abliterated-uncensored |
De-alignment | 0.15 | 0.5 |
huihui-ai/Huihui-Qwen3-0.6B-abliterated-v2 |
De-alignment | 0.15 | 0.5 |
base_model: Qwen/Qwen3-0.6B
merge_method: ties
dtype: bfloat16
parameters:
normalize: true
int8_mask: true
Fine-tune
Post-merge heal pass to fix coherence, identity, counting, and context retention. Also reinforces when to use CoT vs when to answer directly.
- 500 rows — OpenHermes 2.5 (simple QA + instruction following)
- 500 rows — OpenThoughts (reasoning with CoT)
- Method: QLoRA + Flash Attention 2
- Total: 1,000 rows, randomly sampled and shuffled
The 50/50 split between non-CoT and CoT data is seemingly what produced the adaptive routing behavior.
Evaluation
Tested post-heal on the following:
| Test | Result |
|---|---|
| Basic greeting | ✅ Clean, friendly, no loops |
| Identity | ✅ Identifies as AI assistant |
| Exact instruction following ("list 3 fruits") | ✅ Correct count and formatting |
| Context retention across turns | ✅ Recalled user name correctly |
| Math (47 × 83) | ✅ Correct (3,901) with clean working |
| Prime number function | ✅ Correct implementation and examples |
| One-sentence explanation | ✅ Stayed concise, no yapping |
| Adaptive CoT routing | ✅ Emergent, skips think for simple, uses think for complex |
| Uncensored | ✅ Refusals removed |
Quant Guide
| Quant | Quality |
|---|---|
| F16 | Star of the show, best |
| Q6 | Should match F16 |
| Q5 | Starts degrading |
| Q4 | What could you run this on that's that bad |
| Q3 | Don't |
- Downloads last month
- 1,024
3-bit
4-bit
5-bit
6-bit
16-bit