JiRack Coder Reasoning 32B INT4

A fast and efficient coding assistant with a clean built-in web UI, powered by Qwen3.0-Coder-32B-Instruct base and optimized using Microsoft ONNX Runtime.

Quick Start

Watch the JiRack Coder 32B in action: DEMO: JiRack Coder Reasoning 32B Web UI

Run with Docker


--Default CPU--

  • docker run -d
    --name jirack_coder_reasoning_32b
    -p 7869:7869
    --restart unless-stopped
    cmsmanhattan/jirack_coder_32b_int4_qwenbase:latest

--Multi CPU--

  • docker run -d
    --name jirack_coder_reasoning_32b
    -p 7869:7869
    --restart unless-stopped
    --memory=48g
    --cpus=16
    cmsmanhattan/jirack_coder_32b_int4_qwenbase:latest

---GPU-- -- comming soon

  • docker run -d
    --name jirack_coder_reasoning_32b
    -p 7869:7869
    --gpus all
    --restart unless-stopped
    cmsmanhattan/jirack_coder_32b_int4_gpu_qwenbase:latest

services:

image: cmsmanhattan/jirack_coder_32b_int4_qwenbase:latest
container_name: jirack_onnx_service
ports:
  - "7869:7869"
volumes:
  - .:/app
  - ./web:/app/web
environment:
  - MAX_TOKENS=1024
  - TEMPERATURE=0.7
  - TOP_P=0.9
  - DEFAULT_STREAM=False
  - INTRA_THREADS=4
  - USE_ENV_ALLOCATOR=1
deploy:
  resources:
    limits:
      memory: 48g 

Access the UI

Once the container is running, open your browser and navigate to:

http://localhost:7869

This opens the JiRack Coder UI β€” a clean web interface designed for coding.

Changing the Port

The listening port can be easily modified directly from the Settings panel within the JiRack Coder UI.

Licensing

  • The JiRack Coder 32B model is provided under a commercial enterprise license.
  • All JiRack UI clients are provided under a commercial license.
  • However, the UI clients can be used for free when running together with the official JiRack Docker containers, as long as they are not redistributed separately.

JiRack Coder 14B is available under a lighter commercial license (~$12 per user/year).

For commercial licensing, cluster deployment, or enterprise use of the JiRack Coder 32B and JiRack Coder 14B, please contact us.

Hardware Recommendations for AMD Systems

It is significantly heavier than JiRack Coder 14B INT4

Recommended Hardware for JiRack Coder Reasoning 32B INT4. It is one docker container

Use Case CPU GPU (ROCm) VRAM / RAM Expected Speed Recommendation
Recommended Ryzen 9 7950X / 9950X RX 7900 XTX / 2x RX 7900 XT 48GB+ VRAM 35-55 tokens/s Best choice
High Performance Ryzen 9 9950X / Threadripper 2x RX 7900 XTX 48-64GB VRAM 50-75 tokens/s Excellent
Enterprise EPYC 7003/9004 series MI300X or 4x RX 7900 XTX 96GB+ VRAM 70-110 tokens/s Best for production
Budget Option Ryzen 7 7700 / 9700X RX 7900 XTX (24GB) 24GB+ VRAM 25-40 tokens/s Acceptable

Important Memory Notes

Even though the 32B INT4 model itself takes approximately 12–14 GB, we recommend at least 48GB VRAM for the following reasons:

  • KV-cache consumption during generation (especially with long context)
  • ONNX Runtime overhead and temporary buffers
  • System stability and to avoid Out of Memory errors
  • Room for larger context windows

Minimum recommended: 48GB VRAM (dual RX 7900 series or MI300X)
Ideal: 48–64GB VRAM

For pure CPU inference (no GPU), we recommend at least 128GB system RAM (Ryzen 9 7950X/9950X or better).


I will use the default model in full FP32 precision for quantization, allowing us to find the optimal balance between model size and performance.

πŸ“§ Contact & Licensing

For joint venture opportunities, hardware integration, or licensing inquiries:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support