ABot-OCR

ABot-OCR is a document image OCR model that converts PDF/document page images into structured Markdown output, supporting recognition and reconstruction of text, mathematical formulas (LaTeX), tables (HTML), and other elements.

Benchmarks

Requirements

Python 3.11 is recommended. Install the following dependencies:

pip install vllm==0.18.0 torch==2.10.0

Note: Inference uses vLLM to load the model. Sufficient GPU memory is required (~4GB model weights; actual usage depends on batch_size and image resolution).

Inference

Inference script: abot-ocr-infer.py

1. Configure Model Path

Update the default model path in the script:

MODEL_PATH = "./abot-ocr"  # Path to the model directory in this repo

2. Run from Command Line

Edit the parameters in the __main__ block at the bottom of abot-ocr-infer.py, then run:

python abot-ocr-infer.py

Acknowledgements

Our work is inspired by many excellent open-source projects. We sincerely thank the developers of Qwen-VL, PaddleOCR-VL, MinerU, and the broader OCR community.

Downloads last month: 19

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support