opendatalab
/

MinerU2.5-2509-1.2B

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

hotelll commited on Sep 17

Commit

44ac1d9

·

verified ·

1 Parent(s): 68eebb3

Update README

Files changed (1) hide show

README.md +46 -3

README.md CHANGED Viewed

@@ -7,8 +7,51 @@ pipeline_tag: image-text-to-text
 library_name: transformers
 ---
-MinerU2.5 (Pre-release)
-This is a pre-release version of the MinerU2.5 model. The model weights are stable and available for use, primarily intended for internal development and demonstration purposes.
-A comprehensive README.md, along with the technical report and source code, will be published later this month.

 library_name: transformers
 ---
+# MinerU2.5
+We are releasing MinerU2.5, a 1.2B-parameter visual-language model specialized in OCR and document parsing, enabling more accurate and robust parsing of complex and diverse real-world documents.
+The model weights are stable and available for use, primarily intended for internal development and demonstration purposes.
+> ⚠️ A full technical report, source code, and a comprehensive README will be released later this month. Stay tuned!
+## Quick Start
+For convenience, we provide a python package named `mineru-vl-utils` to smoothly use the MinerU2.5 Vision-Language Model. For more information and usages, please refer to [mineru-vl-utils](https://github.com/opendatalab/mineru-vl-utils/tree/main).
+Here we give a simple example to use MinerU2.5 with 🤗 Transformers.
+### Install packages
+``` bash
+pip install mineru-vl-utils[transformers]
+```
+### Run with Transformers
+``` python
+from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
+from PIL import Image
+from mineru_vl_utils import MinerUClient
+model_path = "opendatalab/MinerU2.5-2509-1.2B"
+model = Qwen2VLForConditionalGeneration.from_pretrained(
+    model_path,
+    dtype="auto",
+    device_map="auto"
+)
+processor = AutoProcessor.from_pretrained(
+    model_path,
+    use_fast=True
+)
+client = MinerUClient(
+    backend="transformers",
+    model=model,
+    processor=processor
+)
+image_path = '/path/to/your/image'
+image = Image.open(image_path)
+extracted_blocks = client.two_step_extract(image)
+```