JiRack Coder Reasoning 14B INT4

A fast and efficient coding assistant with a clean built-in web UI, powered by Qwen3.0-Coder-14B-Instruct base and optimized using Microsoft ONNX Runtime.

Quick Start

Watch the JiRack Coder 14B in action: DEMO: JiRack Coder Reasoning 14B Web UI

Run with Docker

--Default CPU--

docker run -d
--name jirack_coder_reasoning_14b
-p 7869:7869
--restart unless-stopped
cmsmanhattan/jirack_coder_14b_int4_qwenbase:latest

--Multi CPU--

docker run -d
--name jirack_coder_reasoning_14b
-p 7869:7869
--restart unless-stopped
--memory=20g
--cpus=12
cmsmanhattan/jirack_coder_14b_int4_qwenbase:latest

---GPU-- -- comming soon

docker run -d
--name jirack_coder_reasoning_14b
-p 7869:7869
--gpus all
--restart unless-stopped
cmsmanhattan/jirack_coder_14b_int4_gpu_qwenbase:latest

Access the UI

Once the container is running, open your browser and navigate to:

http://localhost:7869

This opens the JiRack Coder UI — a clean web interface designed for coding.

Changing the Port

The listening port can be easily modified directly from the Settings panel within the JiRack Coder UI.

Licensing

The JiRack Coder 14B model is provided under a commercial license. It is about $12 for year per user.
All JiRack UI clients are provided under a commercial license.
However, the UI clients can be used for free when running together with the official JiRack Docker containers, as long as they are not redistributed separately.

JiRack Coder 32B is available exclusively under a commercial enterprise license.

For commercial licensing, cluster deployment, or enterprise use of the JiRack Coder 32B and JiRack Coder 14B, please contact us.

JiRack MS Windows 11 Desktop chat client with ollama API setup: https://huggingface.co/kgrabko/JiRackTernary_1b/resolve/main/jirack-chat.zip
Live email chat with model via support@cmsmanhattan.com

Hardware Recommendations for AMD Systems

It is more heavy then JiRack Coder 7B INT8

Recommended Hardware for JiRack Coder Reasoning 14B INT4. It is one docker container

Use Case	CPU	GPU (ROCm)	VRAM / RAM	Expected Speed	Recommendation
Recommended	Ryzen 7 7700 / 9700X	RX 7900 XTX / 7900 XT	24GB VRAM	50-75 tokens/s	Best choice
High Performance	Ryzen 9 7950X / 9950X	RX 7900 XTX	24GB+ VRAM	65-90 tokens/s	Excellent
Enterprise	EPYC 7003/9004 series	MI300X or 2x RX 7900 XTX	48GB+ VRAM	90-140 tokens/s	For 32B model
Budget Option	Ryzen 5 7600 / 9600X	RX 7800 XT (16GB)	16GB VRAM	35-50 tokens/s	Acceptable

Important Memory Notes

Even though the 14B INT4 model itself takes approximately 5–6 GB, we recommend at least 24GB VRAM for the following reasons:

KV-cache consumption during generation (especially with long context)
ONNX Runtime overhead and temporary buffers
System stability and to avoid Out of Memory errors
Room for larger context windows

Minimum recommended: 24GB VRAM (RX 7900 series)
Ideal: 24–32GB VRAM

For pure CPU inference (no GPU), we recommend at least 64GB system RAM (Ryzen 9 7950X/9950X).

I will use the default model in full FP32 precision for quantization, allowing us to find the optimal balance between model size and performance.

📧 Contact & Licensing

For joint venture opportunities, hardware integration, or licensing inquiries:

Email: grabko@cmsmanhattan.com
Phone: +1 (516) 777-0945
Location: New York, USA

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support