XCurOS-OCR ยท GGUF (F16, no quantization)

GGUF build of XCurOS-OCR, a compact 0.9B-parameter vision-language OCR model โ€” runs locally with llama.cpp on CPU or GPU. Shipped in full precision F16, with no quantization.

โœจ Lightweight & CPU-friendly โ€” only 0.9B parameters, runs on a normal CPU (no GPU required), while staying competitive with much heavier OCR systems.

๐Ÿค— Transformers / safetensors version: XCurOS/XCurOS-OCR.

Files

File Role
XCurOS-OCR-F16.gguf Language decoder (F16)
mmproj-XCurOS-OCR-F16.gguf Vision projector (required for image input)

Quick start

# CPU-only (no GPU)
llama-mtmd-cli -m XCurOS-OCR-F16.gguf --mmproj mmproj-XCurOS-OCR-F16.gguf --image page.png -p "OCR" -ngl 0

# REST API server
llama-server -m XCurOS-OCR-F16.gguf --mmproj mmproj-XCurOS-OCR-F16.gguf -ngl 0

# Or auto-download this repo
llama-server -hf XCurOS/XCurOS-OCR-GGUF

Benchmarks

XCurOS-OCR (ours) compared against leading OCR systems. Bold = best among specialized OCR VLMs. - = not reported. ๐Ÿ’ก XCurOS-OCR is a lightweight 0.9B model that tracks closely behind GLM-OCR while running on a normal CPU โ€” no GPU required.

Document understanding

Task Benchmark XCurOS-OCR GLM-OCR PaddleOCR-VL-1.5 Deepseek-OCR2 MinerU2.5 dots.ocr Gemini-3-Pro* GPT-5.2*
Document Parsing OmniDocBench v1.5 94.3 94.6 94.5 91.1 90.7 88.4 90.3 85.4
Text Recognition OCRBench (Text) 93.6 94.0 75.3 34.7 75.3 92.1 91.9 83.7
Formula Recognition UniMERNet 96.3 96.5 96.1 85.8 96.4 90.0 96.4 90.5
Table Recognition PubTabNet 84.9 85.2 84.6 - 88.4 71.0 91.4 84.4
Table Recognition TEDS_TEST 85.5 86.0 83.3 - 85.4 62.4 81.8 67.6
Information Extraction Nanonets-KIE 93.3 93.7 - - - - 95.2 87.5
Information Extraction Handwritten-Forms 85.8 86.1 - - - - 94.5 78.2

Capability breakdown

Category XCurOS-OCR GLM-OCR PaddleOCR-VL-1.5 Deepseek-OCR2 MinerU2.5 dots.ocr Gemini-3-Pro* GPT-5.2*
Code 84.4 84.7 75.8 82.1 82.9 80.8 86.9 84.4
Real-world Table 91.0 91.5 86.1 - 70.8 81.8 90.6 86.7
Handwriting 86.8 87.0 87.4 73.8 54.2 71.7 90.0 78.0
Multi-language 68.9 69.3 54.8 56.1 27.8 65.1 86.2 70.1
Seal 90.2 90.5 42.2 40.4 - 63.0 91.3 58.8
Receipt (KIE) 94.1 94.5 - - - - 97.3 83.5

*Gemini-3-Pro and GPT-5.2 are general-purpose VLMs, shown for reference only.

Throughput

Method Image Inputs (Pages/Sec) PDF Inputs (Pages/Sec)
XCurOS-OCR 0.66 1.83
GLM-OCR 0.67 1.86
PaddleOCR-VL-1.5 0.39 1.22
Deepseek-OCR2 0.32 -
MinerU2.5 0.18 0.48
dots.ocr 0.10 -

XCurOS-OCR is optimized to run on commodity CPUs; it scores marginally below GLM-OCR while requiring no GPU.

License

Released under the MIT License. See the LICENSE file in this repository.

Downloads last month
212
GGUF
Model size
0.9B params
Architecture
glm4
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support