Coral-v1.5-4B

A 4B parameter uncensored generalist with strong multi-step reasoning, correct arithmetic, solid code generation, and long-context coherence across extended conversations. Built from a 7-donor TIES merge of Qwen3-4B finetunes including official Qwen 2507 update variants, healed with a 2,500 row fine-tune pass.

Part of the Coral-v1.5 model family, which adds to the original CoralLM series (Llama 3.2 1B based). Coral-v1.5 moves to Qwen3 architecture for significantly improved base capability.

Note on identity: The model identifies itself as Qwen/Alibaba by default due to base model bleedthrough. A simple system prompt overrides this, no retraining needed.


Improvements over Coral-v1.5-0.6B

Capability 0.6B 4B
Parameters ~600M ~4B
Donors 5 7
Fine-tune rows 1,000 2,500
Inference speed 161 t/s 75 t/s (Q5_K_M)
Math accuracy ✅ Correct ✅ Correct
Multi-step reasoning ⚠️ Basic ✅ Strong
Long multi-turn coherence ⚠️ Short working context ✅ 13+ turns tested
Trick question resistance ⚠️ Untested ✅ Doesn't hallucinate fake memories
Adaptive CoT ✅ Emergent ❌ Smoothed out by larger FT
Code quality ✅ Decent ✅ Better
Uncensored

The 4B trades the emergent adaptive CoT behavior of the 0.6B for significantly stronger raw reasoning capability and coherence at scale. The reasoning happens internally without explicit think blocks.


What makes it interesting

  • 7-donor TIES merge - more donors, more diverse capability blend than the 0.6B
  • Qwen3 original + 2507 cross-mixing - includes both original Qwen3-4B and post-training 2507 update finetunes as contributors
  • Three reasoning distills - knowledge transferred from larger models (DeepSeek, Opus, Gemini) down to 4B scale
  • Trick question resistant - correctly identified a question about a conversation event that never happened rather than hallucinating a fake memory
  • Uncensored - refusal behavior removed via two de-alignment donors, survives the fine-tune pass
  • Long context coherence - maintains conversation state across 13+ turn exchanges

Merge Recipe

Method: TIES
Base: Qwen/Qwen3-4B
Tool: mergekit

Donor Role Weight Density
leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy Thinking / reasoning 0.20 0.5
khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled Reasoning distill 0.20 0.5
ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini Multi-teacher distill 0.20 0.5
Qwen/Qwen3-4B-Instruct-2507 Official instruct (2507) 0.18 0.5
Qwen/Qwen3-4B-Thinking-2507 Official thinking (2507) 0.18 0.5
huihui-ai/Huihui-Qwen3-4B-Instruct-2507-abliterated De-alignment 0.15 0.5
DreamFast/qwen3-4b-heretic De-alignment (heretic method) 0.15 0.5
base_model: Qwen/Qwen3-4B
merge_method: ties
dtype: bfloat16
parameters:
  normalize: true
  int8_mask: true

Fine-tune

Post-merge heal pass to fix coherence, counting, context retention, and question invention behavior from the raw merge.

  • 1,250 rowsOpenHermes 2.5 (simple QA + instruction following)
  • 1,250 rowsOpenThoughts (complex reasoning with CoT)
  • Method: QLoRA + Flash Attention 2, LoRA r16
  • Epochs: 2
  • Total: 2,500 rows, randomly sampled and shuffled
  • Quantization: Q5_K_M (auto-quantized post fine-tune)

Evaluation

Test Result
Basic greeting ✅ Clean, no loops
Exact instruction following ("list 3 fruits") ✅ Correct count and formatting
Context retention across turns ✅ Recalled user name correctly
Math (47 × 83) ✅ Correct (3,901) with clean step-by-step working
Multi-step word problem ✅ Correct with full reasoning
Prime number function ✅ Correct implementation
Constrained creative writing ✅ All constraints met
Long multi-turn conversation (13 turns) ✅ Coherent throughout
Trick question (fake memory) ✅ Correctly refused to hallucinate
Joke repetition awareness ✅ Noticed repeat, told a different one
Uncensored ✅ Refusals removed, survives fine-tune

Inference

> System: You are Coral, a helpful AI assistant. `<whatever else>`

Recommended system prompt to fix identity bleedthrough. The model responds well to persona anchoring, should do well with system prompt and instruciton adherence.

Speed (Q5_K_M): ~75 t/s generation on mid-low consumer hardware

Available Quantizations

All quantized from the BF16 merge output. Quality and speed are relative to Q5_K_M (the baseline). Speed is approximate and hardware-dependent; quality is a general expectation for these quant types on a 4B model.

Quant Size vs Q5_K_M Quality vs Q5_K_M Speed vs Q5_K_M Notes
F16 Much larger Lossless reference ~−45% Full precision, for reference/conversion
Q6_K Larger Near-identical ~−15% Highest practical quality
Q5_K_M baseline baseline baseline Recommended default
Q4_K_M Smaller Slightly lower ~+15% Classic balanced choice
IQ4_NL Smaller ≈ Q4_K_M, slightly better ~+10% Non-linear grid, good quality/size
IQ4_XS Smaller ≈ Q4_K_M ~+15% Smallest 4-bit, importance-matrix
Q3_K_M Much smaller Noticeably lower ~+30% Usable but degraded
IQ3_M Much smaller Lower, better than Q3_K ~+25% Best aggressive option
TQ2_0 Tiny No ~+60% Ternary weights (-1/0/1 only). Don't bother

Recommendation: Q5_K_M for quality, IQ4_XS or IQ4_NL for a good speed/size/quality balance, IQ3_M if you're tight on memory. F16 is for conversion/reference only — no quality benefit over Q6_K at much larger size.


Model Family (so far)

Model Base Donors FT Rows Status
CoralLM-1B Llama3.2-1B 3 400 ✅ Released
Coral-v1.5-0.6B Qwen3-0.6B 5 1,000 ✅ Released
Coral-v1.5-4B Qwen3-4B 7 2,500 ✅ Released
Downloads last month
1,790
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NotHereNorThere/Coral-v1.5-4b

Finetuned
Qwen/Qwen3-4B
Quantized
(227)
this model

Datasets used to train NotHereNorThere/Coral-v1.5-4b

Collection including NotHereNorThere/Coral-v1.5-4b