Typhoon-OCR-3B — MLX q8

MLX-quantized port of typhoon-ai/typhoon-ocr-3b for native Apple Silicon inference. Higher-quality sibling of MegawizCo/typhoon-ocr-3b-mlx-q4; pick this one if you need the lowest CER and can spare the RAM.

Quantization: 8-bit affine, group size 64. Effective rate 9.836 bits/weight. Size on disk: 4.3 GB.

Benchmark

7-image internal smoke set (2 synthetic printed Thai/English + 5 synthetic mixed-Thai handwriting). Same prompt (Extract all text from this image.) on Mac mini Apple Silicon, 2026-05-12:

Backend	CER median (HW)	CER max	Wall median	Generation TPS	Peak RAM
MLX q4	0.009	0.081	1.95 s	~107	~3.5 GB
MLX q8 (this)	0.000	0.081	2.34 s	~65	~5 GB
Ollama `typhoon-ocr1.5-3b` Q4	0.000	0.058	2.90 s	~78 (variable 40-84)	~4 GB

q8 fixes the one handwriting sample where q4 lost ~1pp of CER, at the cost of 30% extra wall time + 1.5GB extra RAM. The 0.081 CER cap on one synthetic handwriting sample is consistent across all backends — likely a font-rendering ambiguity, not a quant defect.

The test set is synthetic — generated Thai medical-style text — not real PHI. Real-world photographed handwriting is harder; expect higher CER.

Usage

uv pip install mlx-vlm

from mlx_vlm import generate, load
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("MegawizCo/typhoon-ocr-3b-mlx-q8")
config = load_config("MegawizCo/typhoon-ocr-3b-mlx-q8")

prompt = apply_chat_template(processor, config, "Extract all text from this image.", num_images=1)
out = generate(model, processor, prompt, image=["prescription.png"], max_tokens=512)
print(out.text)  # Typhoon-OCR wraps output in {"text": "..."}

CLI:

mlx_vlm.generate \
  --model MegawizCo/typhoon-ocr-3b-mlx-q8 \
  --image prescription.png \
  --prompt "Extract all text from this image." \
  --max-tokens 512

Conversion command

mlx_vlm.convert \
  --hf-path typhoon-ai/typhoon-ocr-3b \
  -q --q-bits 8 \
  --mlx-path typhoon-ocr-3b-mlx-q8

License & attribution

License: Apache 2.0 — inherited from upstream typhoon-ai/typhoon-ocr-3b.
Base model: SCB 10X / Typhoon AI.
Quantization + repackaging: MegawizCo (2026-05-12).
Vision architecture: Qwen2.5-VL.

MegawizCo/typhoon-ocr-3b-mlx-q4 — faster, smaller, slight CER trade-off

Downloads last month: 59

Safetensors

Model size

2B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for MegawizCo/typhoon-ocr-3b-mlx-q8

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Finetuned

typhoon-ai/typhoon-ocr-3b

Quantized

(8)

this model