Instructions to use NotHereNorThere/Coral-v1.5-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NotHereNorThere/Coral-v1.5-4b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="NotHereNorThere/Coral-v1.5-4b",
	filename="F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use NotHereNorThere/Coral-v1.5-4b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M

Use Docker

docker model run hf.co/NotHereNorThere/Coral-v1.5-4b:Q4_K_M

LM Studio
Jan

vLLM

How to use NotHereNorThere/Coral-v1.5-4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "NotHereNorThere/Coral-v1.5-4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NotHereNorThere/Coral-v1.5-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/NotHereNorThere/Coral-v1.5-4b:Q4_K_M

Ollama
How to use NotHereNorThere/Coral-v1.5-4b with Ollama:
```
ollama run hf.co/NotHereNorThere/Coral-v1.5-4b:Q4_K_M
```

Unsloth Studio

How to use NotHereNorThere/Coral-v1.5-4b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NotHereNorThere/Coral-v1.5-4b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NotHereNorThere/Coral-v1.5-4b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for NotHereNorThere/Coral-v1.5-4b to start chatting

How to use NotHereNorThere/Coral-v1.5-4b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "NotHereNorThere/Coral-v1.5-4b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use NotHereNorThere/Coral-v1.5-4b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf NotHereNorThere/Coral-v1.5-4b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default NotHereNorThere/Coral-v1.5-4b:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use NotHereNorThere/Coral-v1.5-4b with Docker Model Runner:
```
docker model run hf.co/NotHereNorThere/Coral-v1.5-4b:Q4_K_M
```

Lemonade

How to use NotHereNorThere/Coral-v1.5-4b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull NotHereNorThere/Coral-v1.5-4b:Q4_K_M

Run and chat with the model

lemonade run user.Coral-v1.5-4b-Q4_K_M

List all available models

lemonade list

Coral-v1.5-4B

A 4B parameter uncensored generalist with strong multi-step reasoning, correct arithmetic, solid code generation, and long-context coherence across extended conversations. Built from a 7-donor TIES merge of Qwen3-4B finetunes including official Qwen 2507 update variants, healed with a 2,500 row fine-tune pass.

Part of the Coral-v1.5 model family, which adds to the original CoralLM series (Llama 3.2 1B based). Coral-v1.5 moves to Qwen3 architecture for significantly improved base capability.

Note on identity: The model identifies itself as Qwen/Alibaba by default due to base model bleedthrough. A simple system prompt overrides this, no retraining needed.

Improvements over Coral-v1.5-0.6B

Capability	0.6B	4B
Parameters	~600M	~4B
Donors	5	7
Fine-tune rows	1,000	2,500
Inference speed	161 t/s	75 t/s (Q5_K_M)
Math accuracy	✅ Correct	✅ Correct
Multi-step reasoning	⚠️ Basic	✅ Strong
Long multi-turn coherence	⚠️ Short working context	✅ 13+ turns tested
Trick question resistance	⚠️ Untested	✅ Doesn't hallucinate fake memories
Adaptive CoT	✅ Emergent	❌ Smoothed out by larger FT
Code quality	✅ Decent	✅ Better
Uncensored	✅	✅

The 4B trades the emergent adaptive CoT behavior of the 0.6B for significantly stronger raw reasoning capability and coherence at scale. The reasoning happens internally without explicit think blocks.

What makes it interesting

7-donor TIES merge - more donors, more diverse capability blend than the 0.6B
Qwen3 original + 2507 cross-mixing - includes both original Qwen3-4B and post-training 2507 update finetunes as contributors
Three reasoning distills - knowledge transferred from larger models (DeepSeek, Opus, Gemini) down to 4B scale
Trick question resistant - correctly identified a question about a conversation event that never happened rather than hallucinating a fake memory
Uncensored - refusal behavior removed via two de-alignment donors, survives the fine-tune pass
Long context coherence - maintains conversation state across 13+ turn exchanges

Merge Recipe

Method: TIES
Base: Qwen/Qwen3-4B
Tool: mergekit

Donor	Role	Weight	Density
`leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy`	Thinking / reasoning	0.20	0.5
`khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled`	Reasoning distill	0.20	0.5
`ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini`	Multi-teacher distill	0.20	0.5
`Qwen/Qwen3-4B-Instruct-2507`	Official instruct (2507)	0.18	0.5
`Qwen/Qwen3-4B-Thinking-2507`	Official thinking (2507)	0.18	0.5
`huihui-ai/Huihui-Qwen3-4B-Instruct-2507-abliterated`	De-alignment	0.15	0.5
`DreamFast/qwen3-4b-heretic`	De-alignment (heretic method)	0.15	0.5

base_model: Qwen/Qwen3-4B
merge_method: ties
dtype: bfloat16
parameters:
  normalize: true
  int8_mask: true

Fine-tune

Post-merge heal pass to fix coherence, counting, context retention, and question invention behavior from the raw merge.

1,250 rows — OpenHermes 2.5 (simple QA + instruction following)
1,250 rows — OpenThoughts (complex reasoning with CoT)
Method: QLoRA + Flash Attention 2, LoRA r16
Epochs: 2
Total: 2,500 rows, randomly sampled and shuffled
Quantization: Q5_K_M (auto-quantized post fine-tune)

Evaluation

Test	Result
Basic greeting	✅ Clean, no loops
Exact instruction following ("list 3 fruits")	✅ Correct count and formatting
Context retention across turns	✅ Recalled user name correctly
Math (47 × 83)	✅ Correct (3,901) with clean step-by-step working
Multi-step word problem	✅ Correct with full reasoning
Prime number function	✅ Correct implementation
Constrained creative writing	✅ All constraints met
Long multi-turn conversation (13 turns)	✅ Coherent throughout
Trick question (fake memory)	✅ Correctly refused to hallucinate
Joke repetition awareness	✅ Noticed repeat, told a different one
Uncensored	✅ Refusals removed, survives fine-tune

Inference

> System: You are Coral, a helpful AI assistant. `<whatever else>`

Recommended system prompt to fix identity bleedthrough. The model responds well to persona anchoring, should do well with system prompt and instruciton adherence.

Speed (Q5_K_M): ~75 t/s generation on mid-low consumer hardware

Available Quantizations

All quantized from the BF16 merge output. Quality and speed are relative to Q5_K_M (the baseline). Speed is approximate and hardware-dependent; quality is a general expectation for these quant types on a 4B model.

Quant	Size vs Q5_K_M	Quality vs Q5_K_M	Speed vs Q5_K_M	Notes
F16	Much larger	Lossless reference	~−45%	Full precision, for reference/conversion
Q6_K	Larger	Near-identical	~−15%	Highest practical quality
Q5_K_M	baseline	baseline	baseline	Recommended default
Q4_K_M	Smaller	Slightly lower	~+15%	Classic balanced choice
IQ4_NL	Smaller	≈ Q4_K_M, slightly better	~+10%	Non-linear grid, good quality/size
IQ4_XS	Smaller	≈ Q4_K_M	~+15%	Smallest 4-bit, importance-matrix
Q3_K_M	Much smaller	Noticeably lower	~+30%	Usable but degraded
IQ3_M	Much smaller	Lower, better than Q3_K	~+25%	Best aggressive option
TQ2_0	Tiny	No	~+60%	Ternary weights (-1/0/1 only). Don't bother

Recommendation: Q5_K_M for quality, IQ4_XS or IQ4_NL for a good speed/size/quality balance, IQ3_M if you're tight on memory. F16 is for conversion/reference only — no quality benefit over Q6_K at much larger size.

Model Family (so far)

Model	Base	Donors	FT Rows	Status
CoralLM-1B	Llama3.2-1B	3	400	✅ Released
Coral-v1.5-0.6B	Qwen3-0.6B	5	1,000	✅ Released
Coral-v1.5-4B	Qwen3-4B	7	2,500	✅ Released

Downloads last month: 1,790

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

16-bit

Model tree for NotHereNorThere/Coral-v1.5-4b

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Quantized

(227)

this model

Datasets used to train NotHereNorThere/Coral-v1.5-4b

Collection including NotHereNorThere/Coral-v1.5-4b

Coral

Collection

Merged models to combine strengths without retraining or heavy fine tuning. • 3 items • Updated 4 days ago