YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
ABot-OCR
ABot-OCR is a document image OCR model that converts PDF/document page images into structured Markdown output, supporting recognition and reconstruction of text, mathematical formulas (LaTeX), tables (HTML), and other elements.
Benchmarks
Requirements
Python 3.11 is recommended. Install the following dependencies:
pip install vllm==0.18.0 torch==2.10.0
Note: Inference uses vLLM to load the model. Sufficient GPU memory is required (~4GB model weights; actual usage depends on
batch_sizeand image resolution).
Inference
Inference script: abot-ocr-infer.py
1. Configure Model Path
Update the default model path in the script:
MODEL_PATH = "./abot-ocr" # Path to the model directory in this repo
2. Run from Command Line
Edit the parameters in the __main__ block at the bottom of abot-ocr-infer.py, then run:
python abot-ocr-infer.py
Acknowledgements
Our work is inspired by many excellent open-source projects. We sincerely thank the developers of Qwen-VL, PaddleOCR-VL, MinerU, and the broader OCR community.
- Downloads last month
- 19
