โœจ Introduction

MinerU-Popo is a lightweight and universal framework for POst-Processing OCR outputs, bridging the gap between page-level OCR parsing and document-level semantic structure. It construct document tree structure based on with a 4B post-processing model performing four subtasks: table truncation analysis, text truncation analysis, title hierarchy analysis, and image-text association analysis. We handle the challenges of cross-page geometric discontinuity, redundant document parsing and scalability to long documents via:

  • Task-Oriented Data Engine: Generate representative training data and simplify the task-specific input.
  • Dynamic Chunking and Synchronization: Process long document by dynamic chunks and reduce deviations across chunks to preserve global consistency.
  • Document Enrichment: Structurally construct a tree, semantically generate summaries and split long-section nodes.

๐Ÿ“Š Performance

Better Hierarchy (TEDS) after Post-Processing

Basic OCR Before After
MinerU 53.7 90.6
MonkeyOCR 48.9 87.4
Dolphin 60.4 83.5
PaddleOCR 59.3 82.6
GLM-OCR 53.5 81.8

Advantages Compared to Directly Using Pre-trained Model

Model TEDS Doc/s
MinerU-Popo 90.6 0.37
Qwen3-VL-2B 21.2 0.22
Qwen3-VL-4B 56.5 0.20
Qwen3-VL-8B 65.9 0.16
Qwen3-VL-32B 78.0 0.04

Benefits for Downstream Retrieval and Analysis (Acc on ViDoRe V3)

Method C.S. Fin. H.R. Ind. Phar.
MinerU-Popo 84.4 49.5 66.8 58.7 71.6
Raw RAG 82.3 48.7 63.2 60.4 64.4
Visual RAG 80.7 58.4 64.8 59.7 67.6

โš™๏ธ Setup

Please refer to https://github.com/opendatalab/MinerU-Popo

Downloads last month
353
Safetensors
Model size
4B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DreamEternal/MinerU-Popo

Finetuned
(293)
this model