Instructions to use MegawizCo/typhoon-ocr-3b-mlx-q8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use MegawizCo/typhoon-ocr-3b-mlx-q8 with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("MegawizCo/typhoon-ocr-3b-mlx-q8") config = load_config("MegawizCo/typhoon-ocr-3b-mlx-q8") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Typhoon-OCR-3B — MLX q8
MLX-quantized port of typhoon-ai/typhoon-ocr-3b for native Apple Silicon inference. Higher-quality sibling of MegawizCo/typhoon-ocr-3b-mlx-q4; pick this one if you need the lowest CER and can spare the RAM.
Quantization: 8-bit affine, group size 64. Effective rate 9.836 bits/weight. Size on disk: 4.3 GB.
Benchmark
7-image internal smoke set (2 synthetic printed Thai/English + 5 synthetic mixed-Thai handwriting). Same prompt (Extract all text from this image.) on Mac mini Apple Silicon, 2026-05-12:
| Backend | CER median (HW) | CER max | Wall median | Generation TPS | Peak RAM |
|---|---|---|---|---|---|
| MLX q4 | 0.009 | 0.081 | 1.95 s | ~107 | ~3.5 GB |
| MLX q8 (this) | 0.000 | 0.081 | 2.34 s | ~65 | ~5 GB |
Ollama typhoon-ocr1.5-3b Q4 |
0.000 | 0.058 | 2.90 s | ~78 (variable 40-84) | ~4 GB |
q8 fixes the one handwriting sample where q4 lost ~1pp of CER, at the cost of 30% extra wall time + 1.5GB extra RAM. The 0.081 CER cap on one synthetic handwriting sample is consistent across all backends — likely a font-rendering ambiguity, not a quant defect.
The test set is synthetic — generated Thai medical-style text — not real PHI. Real-world photographed handwriting is harder; expect higher CER.
Usage
uv pip install mlx-vlm
from mlx_vlm import generate, load
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
model, processor = load("MegawizCo/typhoon-ocr-3b-mlx-q8")
config = load_config("MegawizCo/typhoon-ocr-3b-mlx-q8")
prompt = apply_chat_template(processor, config, "Extract all text from this image.", num_images=1)
out = generate(model, processor, prompt, image=["prescription.png"], max_tokens=512)
print(out.text) # Typhoon-OCR wraps output in {"text": "..."}
CLI:
mlx_vlm.generate \
--model MegawizCo/typhoon-ocr-3b-mlx-q8 \
--image prescription.png \
--prompt "Extract all text from this image." \
--max-tokens 512
Conversion command
mlx_vlm.convert \
--hf-path typhoon-ai/typhoon-ocr-3b \
-q --q-bits 8 \
--mlx-path typhoon-ocr-3b-mlx-q8
License & attribution
- License: Apache 2.0 — inherited from upstream
typhoon-ai/typhoon-ocr-3b. - Base model: SCB 10X / Typhoon AI.
- Quantization + repackaging: MegawizCo (2026-05-12).
- Vision architecture: Qwen2.5-VL.
Related
MegawizCo/typhoon-ocr-3b-mlx-q4— faster, smaller, slight CER trade-off
- Downloads last month
- 59
8-bit