Instructions to use NotHereNorThere/Coral-v1.5-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use NotHereNorThere/Coral-v1.5-4b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="NotHereNorThere/Coral-v1.5-4b", filename="F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use NotHereNorThere/Coral-v1.5-4b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M
Use Docker
docker model run hf.co/NotHereNorThere/Coral-v1.5-4b:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use NotHereNorThere/Coral-v1.5-4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "NotHereNorThere/Coral-v1.5-4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NotHereNorThere/Coral-v1.5-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/NotHereNorThere/Coral-v1.5-4b:Q4_K_M
- Ollama
How to use NotHereNorThere/Coral-v1.5-4b with Ollama:
ollama run hf.co/NotHereNorThere/Coral-v1.5-4b:Q4_K_M
- Unsloth Studio
How to use NotHereNorThere/Coral-v1.5-4b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for NotHereNorThere/Coral-v1.5-4b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for NotHereNorThere/Coral-v1.5-4b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for NotHereNorThere/Coral-v1.5-4b to start chatting
- Pi
How to use NotHereNorThere/Coral-v1.5-4b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "NotHereNorThere/Coral-v1.5-4b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use NotHereNorThere/Coral-v1.5-4b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default NotHereNorThere/Coral-v1.5-4b:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use NotHereNorThere/Coral-v1.5-4b with Docker Model Runner:
docker model run hf.co/NotHereNorThere/Coral-v1.5-4b:Q4_K_M
- Lemonade
How to use NotHereNorThere/Coral-v1.5-4b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull NotHereNorThere/Coral-v1.5-4b:Q4_K_M
Run and chat with the model
lemonade run user.Coral-v1.5-4b-Q4_K_M
List all available models
lemonade list
Coral-v1.5-4B
A 4B parameter uncensored generalist with strong multi-step reasoning, correct arithmetic, solid code generation, and long-context coherence across extended conversations. Built from a 7-donor TIES merge of Qwen3-4B finetunes including official Qwen 2507 update variants, healed with a 2,500 row fine-tune pass.
Part of the Coral-v1.5 model family, which adds to the original CoralLM series (Llama 3.2 1B based). Coral-v1.5 moves to Qwen3 architecture for significantly improved base capability.
Note on identity: The model identifies itself as Qwen/Alibaba by default due to base model bleedthrough. A simple system prompt overrides this, no retraining needed.
Improvements over Coral-v1.5-0.6B
| Capability | 0.6B | 4B |
|---|---|---|
| Parameters | ~600M | ~4B |
| Donors | 5 | 7 |
| Fine-tune rows | 1,000 | 2,500 |
| Inference speed | 161 t/s | 75 t/s (Q5_K_M) |
| Math accuracy | ✅ Correct | ✅ Correct |
| Multi-step reasoning | ⚠️ Basic | ✅ Strong |
| Long multi-turn coherence | ⚠️ Short working context | ✅ 13+ turns tested |
| Trick question resistance | ⚠️ Untested | ✅ Doesn't hallucinate fake memories |
| Adaptive CoT | ✅ Emergent | ❌ Smoothed out by larger FT |
| Code quality | ✅ Decent | ✅ Better |
| Uncensored | ✅ | ✅ |
The 4B trades the emergent adaptive CoT behavior of the 0.6B for significantly stronger raw reasoning capability and coherence at scale. The reasoning happens internally without explicit think blocks.
What makes it interesting
- 7-donor TIES merge - more donors, more diverse capability blend than the 0.6B
- Qwen3 original + 2507 cross-mixing - includes both original Qwen3-4B and post-training 2507 update finetunes as contributors
- Three reasoning distills - knowledge transferred from larger models (DeepSeek, Opus, Gemini) down to 4B scale
- Trick question resistant - correctly identified a question about a conversation event that never happened rather than hallucinating a fake memory
- Uncensored - refusal behavior removed via two de-alignment donors, survives the fine-tune pass
- Long context coherence - maintains conversation state across 13+ turn exchanges
Merge Recipe
Method: TIES
Base: Qwen/Qwen3-4B
Tool: mergekit
| Donor | Role | Weight | Density |
|---|---|---|---|
leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy |
Thinking / reasoning | 0.20 | 0.5 |
khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled |
Reasoning distill | 0.20 | 0.5 |
ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini |
Multi-teacher distill | 0.20 | 0.5 |
Qwen/Qwen3-4B-Instruct-2507 |
Official instruct (2507) | 0.18 | 0.5 |
Qwen/Qwen3-4B-Thinking-2507 |
Official thinking (2507) | 0.18 | 0.5 |
huihui-ai/Huihui-Qwen3-4B-Instruct-2507-abliterated |
De-alignment | 0.15 | 0.5 |
DreamFast/qwen3-4b-heretic |
De-alignment (heretic method) | 0.15 | 0.5 |
base_model: Qwen/Qwen3-4B
merge_method: ties
dtype: bfloat16
parameters:
normalize: true
int8_mask: true
Fine-tune
Post-merge heal pass to fix coherence, counting, context retention, and question invention behavior from the raw merge.
- 1,250 rows — OpenHermes 2.5 (simple QA + instruction following)
- 1,250 rows — OpenThoughts (complex reasoning with CoT)
- Method: QLoRA + Flash Attention 2, LoRA r16
- Epochs: 2
- Total: 2,500 rows, randomly sampled and shuffled
- Quantization: Q5_K_M (auto-quantized post fine-tune)
Evaluation
| Test | Result |
|---|---|
| Basic greeting | ✅ Clean, no loops |
| Exact instruction following ("list 3 fruits") | ✅ Correct count and formatting |
| Context retention across turns | ✅ Recalled user name correctly |
| Math (47 × 83) | ✅ Correct (3,901) with clean step-by-step working |
| Multi-step word problem | ✅ Correct with full reasoning |
| Prime number function | ✅ Correct implementation |
| Constrained creative writing | ✅ All constraints met |
| Long multi-turn conversation (13 turns) | ✅ Coherent throughout |
| Trick question (fake memory) | ✅ Correctly refused to hallucinate |
| Joke repetition awareness | ✅ Noticed repeat, told a different one |
| Uncensored | ✅ Refusals removed, survives fine-tune |
Inference
> System: You are Coral, a helpful AI assistant. `<whatever else>`
Recommended system prompt to fix identity bleedthrough. The model responds well to persona anchoring, should do well with system prompt and instruciton adherence.
Speed (Q5_K_M): ~75 t/s generation on mid-low consumer hardware
Available Quantizations
All quantized from the BF16 merge output. Quality and speed are relative to Q5_K_M (the baseline). Speed is approximate and hardware-dependent; quality is a general expectation for these quant types on a 4B model.
| Quant | Size vs Q5_K_M | Quality vs Q5_K_M | Speed vs Q5_K_M | Notes |
|---|---|---|---|---|
| F16 | Much larger | Lossless reference | ~−45% | Full precision, for reference/conversion |
| Q6_K | Larger | Near-identical | ~−15% | Highest practical quality |
| Q5_K_M | baseline | baseline | baseline | Recommended default |
| Q4_K_M | Smaller | Slightly lower | ~+15% | Classic balanced choice |
| IQ4_NL | Smaller | ≈ Q4_K_M, slightly better | ~+10% | Non-linear grid, good quality/size |
| IQ4_XS | Smaller | ≈ Q4_K_M | ~+15% | Smallest 4-bit, importance-matrix |
| Q3_K_M | Much smaller | Noticeably lower | ~+30% | Usable but degraded |
| IQ3_M | Much smaller | Lower, better than Q3_K | ~+25% | Best aggressive option |
| TQ2_0 | Tiny | No | ~+60% | Ternary weights (-1/0/1 only). Don't bother |
Recommendation: Q5_K_M for quality, IQ4_XS or IQ4_NL for a good speed/size/quality balance, IQ3_M if you're tight on memory. F16 is for conversion/reference only — no quality benefit over Q6_K at much larger size.
Model Family (so far)
| Model | Base | Donors | FT Rows | Status |
|---|---|---|---|---|
| CoralLM-1B | Llama3.2-1B | 3 | 400 | ✅ Released |
| Coral-v1.5-0.6B | Qwen3-0.6B | 5 | 1,000 | ✅ Released |
| Coral-v1.5-4B | Qwen3-4B | 7 | 2,500 | ✅ Released |
- Downloads last month
- 1,790
2-bit
3-bit
4-bit
5-bit
6-bit
16-bit