typhoon-ocr1.5-2b-8bit

This model was converted to MLX format from typhoon-ai/typhoon-ocr1.5-2b using mlx-vlm 0.6.3.

Typhoon OCR 1.5 2B is a Qwen3-VL-based vision-language model for Thai and English document understanding. It produces structured output: Markdown text, HTML <table> for tables, LaTeX for equations, <figure> for images/charts, and <page_number> tags.

This checkpoint is quantized to 8-bit (group size 64, affine mode), ~~9.94 bits per weight. The vision encoder is kept at higher precision, so OCR accuracy is preserved while roughly halving the size (~~2.5 GB). Designed for Apple Silicon.

Prompt

Typhoon OCR is instruction-tuned and works best only with its official prompt. Use it verbatim (e.g. saved to prompt.txt):

Extract all text from the image.

Instructions:
- Only return the clean Markdown.
- Do not include any explanation or extra text.
- You must include all information on the page.

Formatting Rules:
- Tables: Render tables using <table>...</table> in clean HTML format.
- Equations: Render equations using LaTeX syntax with inline ($...$) and block ($$...$$).
- Images/Charts/Diagrams: Wrap any clearly defined visual areas (e.g. charts, diagrams, pictures) in:

<figure>
Describe the image's main elements (people, objects, text), note any contextual clues (place, event, culture), mention visible text and its meaning, provide deeper analysis when relevant (especially for financial charts, graphs, or documents), comment on style or architecture if relevant, then give a concise overall summary. Describe in Thai.
</figure>

- Page Numbers: Wrap page numbers in <page_number>...</page_number> (e.g., <page_number>14</page_number>).
- Checkboxes: Use the unchecked / checked box characters as appropriate.

Usage

pip install -U mlx-vlm

python -m mlx_vlm.generate \
  --model mlx-community/typhoon-ocr1.5-2b-8bit \
  --image page.jpg \
  --prompt "$(cat prompt.txt)" \
  --max-tokens 4096 \
  --temperature 0.0 \
  --repetition-penalty 1.1

Recommended generation parameters

Typhoon OCR is a document-extraction model, not a chat model. It needs near-deterministic decoding so it does not hallucinate characters or loop on repeated table cells.

Parameter	Recommended	Notes
`temperature`	`0.0` (greedy)	Deterministic extraction — always picks the most-confident token. SCB10X also publishes `0.1` as an alternative.
`repetition_penalty`	`1.1`	Stops the model looping on repeated dashes / blank cells in dense tables. Exposed as `--repetition-penalty`.
`max_tokens`	`4096`	Headroom for a full, dense page.
`top_p`	`0.6`	Only has an effect when `temperature > 0` (e.g. `0.1`). Not exposed by the `mlx_vlm.generate` CLI — set it via the Python sampler or the `mlx_vlm.server` request body if you raise the temperature.

Image resolution: the Qwen3-VL processor handles dynamic resolution automatically. For dense A4 pages, feed a reasonably high-resolution scan (long side ~1500-2000 px) to keep small text sharp; on 16 GB Apple Silicon, avoid extremely large images to prevent out-of-memory. Keep the KV cache unquantized (the default).

Conversion

python -m mlx_vlm convert \
  --hf-path typhoon-ai/typhoon-ocr1.5-2b \
  --mlx-path typhoon-ocr1.5-2b-8bit \
  -q --q-bits 8 --q-group-size 64

License

Apache-2.0, inherited from the base model typhoon-ai/typhoon-ocr1.5-2b.

Downloads last month: -

Safetensors

Model size

0.9B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for mlx-community/typhoon-ocr1.5-2b-8bit

Base model

Qwen/Qwen3-VL-2B-Instruct

Finetuned

typhoon-ai/typhoon-ocr1.5-2b

Quantized

(3)

this model