Typhoon-OCR-3B — MLX q4

MLX-quantized port of typhoon-ai/typhoon-ocr-3b for native Apple Silicon inference.

Quantization: 4-bit affine, group size 64. Effective rate 6.549 bits/weight (vision-projector + layer-norm weights stay full precision per mlx-vlm defaults). Size on disk: 2.9 GB.

Benchmark

7-image internal smoke set (2 synthetic printed Thai/English + 5 synthetic mixed-Thai handwriting). Same prompt (Extract all text from this image.) on Mac mini Apple Silicon, 2026-05-12:

Backend CER median (HW) CER max Wall median Generation TPS Peak RAM
MLX q4 (this) 0.009 0.081 1.95 s ~107 ~3.5 GB
MLX q8 0.000 0.081 2.34 s ~65 ~5 GB
Ollama typhoon-ocr1.5-3b Q4 0.000 0.058 2.90 s ~78 (variable 40-84) ~4 GB

q4 trades ~1pp of CER on one handwriting sample for a 33% wall-clock speedup and a smaller footprint vs q8. Pick q4 for latency-bound serving, q8 for strict quality.

The test set is synthetic — generated Thai medical-style text — not real PHI. Real-world photographed handwriting is harder; expect higher CER.

Usage

uv pip install mlx-vlm
from mlx_vlm import generate, load
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("MegawizCo/typhoon-ocr-3b-mlx-q4")
config = load_config("MegawizCo/typhoon-ocr-3b-mlx-q4")

prompt = apply_chat_template(processor, config, "Extract all text from this image.", num_images=1)
out = generate(model, processor, prompt, image=["prescription.png"], max_tokens=512)
print(out.text)  # Typhoon-OCR wraps output in {"text": "..."}

CLI:

mlx_vlm.generate \
  --model MegawizCo/typhoon-ocr-3b-mlx-q4 \
  --image prescription.png \
  --prompt "Extract all text from this image." \
  --max-tokens 512

Conversion command

mlx_vlm.convert \
  --hf-path typhoon-ai/typhoon-ocr-3b \
  -q --q-bits 4 \
  --mlx-path typhoon-ocr-3b-mlx-q4

Reproducible — re-run on the upstream typhoon-ai/typhoon-ocr-3b to regenerate equivalent weights.

License & attribution

  • License: Apache 2.0 — inherited from upstream typhoon-ai/typhoon-ocr-3b.
  • Base model: SCB 10X / Typhoon AI. All credit for the underlying VLM training goes to them.
  • Quantization + repackaging: MegawizCo (2026-05-12).
  • Vision architecture: Qwen2.5-VL.

Related

Downloads last month
80
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MegawizCo/typhoon-ocr-3b-mlx-q4

Quantized
(8)
this model