typhoon-ocr1.5-2b-8bit

This model was converted to MLX format from typhoon-ai/typhoon-ocr1.5-2b using mlx-vlm 0.6.3.

Typhoon OCR 1.5 2B is a Qwen3-VL-based vision-language model for Thai and English document understanding. It produces structured output: Markdown text, HTML <table> for tables, LaTeX for equations, <figure> for images/charts, and <page_number> tags.

This checkpoint is quantized to 8-bit (group size 64, affine mode), 9.94 bits per weight. The vision encoder is kept at higher precision, so OCR accuracy is preserved while roughly halving the size (2.5 GB). Designed for Apple Silicon.

Prompt

Typhoon OCR is instruction-tuned and works best only with its official prompt. Use it verbatim (e.g. saved to prompt.txt):

Extract all text from the image.

Instructions:
- Only return the clean Markdown.
- Do not include any explanation or extra text.
- You must include all information on the page.

Formatting Rules:
- Tables: Render tables using <table>...</table> in clean HTML format.
- Equations: Render equations using LaTeX syntax with inline ($...$) and block ($$...$$).
- Images/Charts/Diagrams: Wrap any clearly defined visual areas (e.g. charts, diagrams, pictures) in:

<figure>
Describe the image's main elements (people, objects, text), note any contextual clues (place, event, culture), mention visible text and its meaning, provide deeper analysis when relevant (especially for financial charts, graphs, or documents), comment on style or architecture if relevant, then give a concise overall summary. Describe in Thai.
</figure>

- Page Numbers: Wrap page numbers in <page_number>...</page_number> (e.g., <page_number>14</page_number>).
- Checkboxes: Use the unchecked / checked box characters as appropriate.

Usage

pip install -U mlx-vlm
python -m mlx_vlm.generate \
  --model mlx-community/typhoon-ocr1.5-2b-8bit \
  --image page.jpg \
  --prompt "$(cat prompt.txt)" \
  --max-tokens 4096 \
  --temperature 0.0 \
  --repetition-penalty 1.1

Recommended generation parameters

Typhoon OCR is a document-extraction model, not a chat model. It needs near-deterministic decoding so it does not hallucinate characters or loop on repeated table cells.

Parameter Recommended Notes
temperature 0.0 (greedy) Deterministic extraction — always picks the most-confident token. SCB10X also publishes 0.1 as an alternative.
repetition_penalty 1.1 Stops the model looping on repeated dashes / blank cells in dense tables. Exposed as --repetition-penalty.
max_tokens 4096 Headroom for a full, dense page.
top_p 0.6 Only has an effect when temperature > 0 (e.g. 0.1). Not exposed by the mlx_vlm.generate CLI — set it via the Python sampler or the mlx_vlm.server request body if you raise the temperature.

Image resolution: the Qwen3-VL processor handles dynamic resolution automatically. For dense A4 pages, feed a reasonably high-resolution scan (long side ~1500-2000 px) to keep small text sharp; on 16 GB Apple Silicon, avoid extremely large images to prevent out-of-memory. Keep the KV cache unquantized (the default).

Conversion

python -m mlx_vlm convert \
  --hf-path typhoon-ai/typhoon-ocr1.5-2b \
  --mlx-path typhoon-ocr1.5-2b-8bit \
  -q --q-bits 8 --q-group-size 64

License

Apache-2.0, inherited from the base model typhoon-ai/typhoon-ocr1.5-2b.

Downloads last month
-
Safetensors
Model size
0.9B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/typhoon-ocr1.5-2b-8bit

Quantized
(3)
this model