โจ Introduction
MinerU-Popo is a lightweight and universal framework for POst-Processing OCR outputs, bridging the gap between page-level OCR parsing and document-level semantic structure. It construct document tree structure based on with a 4B post-processing model performing four subtasks: table truncation analysis, text truncation analysis, title hierarchy analysis, and image-text association analysis. We handle the challenges of cross-page geometric discontinuity, redundant document parsing and scalability to long documents via:
- Task-Oriented Data Engine: Generate representative training data and simplify the task-specific input.
- Dynamic Chunking and Synchronization: Process long document by dynamic chunks and reduce deviations across chunks to preserve global consistency.
- Document Enrichment: Structurally construct a tree, semantically generate summaries and split long-section nodes.
๐ Performance
Better Hierarchy (TEDS) after Post-Processing
| Basic OCR | Before | After |
|---|---|---|
| MinerU | 53.7 | 90.6 |
| MonkeyOCR | 48.9 | 87.4 |
| Dolphin | 60.4 | 83.5 |
| PaddleOCR | 59.3 | 82.6 |
| GLM-OCR | 53.5 | 81.8 |
Advantages Compared to Directly Using Pre-trained Model
| Model | TEDS | Doc/s |
|---|---|---|
| MinerU-Popo | 90.6 | 0.37 |
| Qwen3-VL-2B | 21.2 | 0.22 |
| Qwen3-VL-4B | 56.5 | 0.20 |
| Qwen3-VL-8B | 65.9 | 0.16 |
| Qwen3-VL-32B | 78.0 | 0.04 |
Benefits for Downstream Retrieval and Analysis (Acc on ViDoRe V3)
| Method | C.S. | Fin. | H.R. | Ind. | Phar. |
|---|---|---|---|---|---|
| MinerU-Popo | 84.4 | 49.5 | 66.8 | 58.7 | 71.6 |
| Raw RAG | 82.3 | 48.7 | 63.2 | 60.4 | 64.4 |
| Visual RAG | 80.7 | 58.4 | 64.8 | 59.7 | 67.6 |
โ๏ธ Setup
Please refer to https://github.com/opendatalab/MinerU-Popo
- Downloads last month
- 353
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for DreamEternal/MinerU-Popo
Base model
Qwen/Qwen3-VL-4B-Instruct