dots.ocr — CrispEmbed GGUF

dots.ocr unifies layout detection, text extraction, table parsing, and formula recognition in a single VLM. 100+ languages. 88.4% on OmniDocBench.

Architecture

Vision: Custom ViT (42 layers, 1536d, patch 14, 2D RoPE, SwiGLU FFN, PatchMerger 2x2)
LLM: Qwen2 (28 layers, 1536d, GQA 12/2, standard RoPE, attention_bias=true)
Training: Prompt-based task switching (OCR, layout, table, formula)

GGUF

Model size

3B params

Architecture

dots_ocr

Hardware compatibility

8-bit

16-bit

Base model

Quantized

(10)

this model