Instructions to use Fiona1019/PP-DocLayoutV3-ov with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PaddleOCR
How to use Fiona1019/PP-DocLayoutV3-ov with PaddleOCR:
# 1. See https://www.paddlepaddle.org.cn/en/install to install paddlepaddle # 2. pip install paddleocr from paddleocr import LayoutDetection model = LayoutDetection(model_name="PP-DocLayoutV3-ov") output = model.predict(input="path/to/image.png", batch_size=1) for res in output: res.print() res.save_to_img(save_path="./output/") res.save_to_json(save_path="./output/res.json") - Notebooks
- Google Colab
- Kaggle
PP-DocLayoutV3 · OpenVINO IR
Layout Analysis Module of PaddleOCR-VL-1.5 — converted to OpenVINO™ IR for local inference on Intel CPU / GPU / NPU
Introduction · 简介
This repository hosts the OpenVINO™ IR build of PP-DocLayoutV3, the layout-analysis module of
PaddleOCR-VL-1.5. The original PaddlePaddle weights have been converted to OpenVINO Intermediate
Representation (inference.xml + inference.bin) so the model runs fully locally on Intel CPU,
integrated/discrete GPU, and NPU via the OpenVINO runtime — no cloud service and no PaddlePaddle
runtime required.
本仓库提供 PP-DocLayoutV3 的 OpenVINO™ IR 版本,它是 PaddleOCR-VL-1.5 的版面分析(layout)模块。 模型已从 PaddlePaddle 权重转换为 OpenVINO 中间表示(
inference.xml+inference.bin),可在 Intel CPU / 集显 / 独显 / NPU 上完全本地运行,无需联网、无需安装 PaddlePaddle。
PP-DocLayoutV3 is specifically engineered to handle non-planar document images. It directly predicts multi-point bounding boxes for layout elements (rather than standard two-point boxes) and determines the logical reading order for skewed and curved surfaces within a single forward pass, significantly reducing cascading errors. It is an essential component of PaddleOCR-VL-1.5, providing the layout analysis that drives high-precision parsing of real-world documents.
Model Architecture
What's in this repo · 文件说明
| File | Description |
|---|---|
inference.xml |
OpenVINO IR network topology |
inference.bin |
OpenVINO IR weights |
inference.yml |
Preprocessing config (resize 800×800, normalization) + 25-class label list + draw_threshold |
config.json |
Model config |
preprocessor_config.json |
Image-processor config |
- Architecture: DETR-style detector
- Input:
image[1, 3, 800, 800](BGR, resized to 800×800, no keep-ratio) +scale_factor[1, 2] - Default score threshold:
0.5(draw_thresholdininference.yml) - Layout classes (25): abstract, algorithm, aside_text, chart, content, display_formula, doc_title, figure_title, footer, footer_image, footnote, formula_number, header, header_image, image, inline_formula, number, paragraph_title, reference, reference_content, seal, table, text, vertical_text, vision_footnote
Usage · 使用方法
1. Recommended — as part of the PaddleOCR-VL OpenVINO pipeline
This model is the layout stage of an end-to-end document-parsing pipeline. Pair it with
FionaGu1019/PaddleOCR-VL-1.5-ov
(the recognition VLM) to get layout detection → reading order → text/table/formula recognition.
from modelscope import snapshot_download
# Download both stages of the pipeline
layout_dir = snapshot_download("FionaGu1019/PP-DocLayoutV3-ov")
vl_dir = snapshot_download("FionaGu1019/PaddleOCR-VL-1.5-ov")
2. Standalone OpenVINO inference
import cv2
import numpy as np
import openvino as ov
model_dir = "PP-DocLayoutV3-ov" # local path or snapshot_download(...) result
core = ov.Core()
compiled = core.compile_model(f"{model_dir}/inference.xml", "GPU") # "CPU" / "GPU" / "NPU"
# Preprocess: resize to 800x800 (no keep-ratio), CHW, float32
img = cv2.imread("document.jpg")
h, w = img.shape[:2]
resized = cv2.resize(img, (800, 800), interpolation=cv2.INTER_LINEAR)
blob = resized.astype(np.float32).transpose(2, 0, 1)[None] # [1,3,800,800]
scale_factor = np.array([[800 / h, 800 / w]], dtype=np.float32) # [1,2]
results = compiled({"image": blob, "scale_factor": scale_factor})
# Outputs are DETR detections (class id, score, box); filter by draw_threshold=0.5
# and map class ids via the label_list in inference.yml.
Tip: enable on-disk kernel caching with
core.set_property({"CACHE_DIR": ".ov_cache"})to avoid re-compiling kernels on every run (a large speedup on GPU).
Visualization
Light Variation
Skewing
Screen-photo
Curving
Notes · 说明
- This is a format conversion of the official PaddlePaddle model to OpenVINO IR; the network weights and detection behaviour are intended to match the original PP-DocLayoutV3. For the original weights see PaddlePaddle/PP-DocLayoutV3.
- For best results across Intel hardware, prefer GPU when available and fall back to CPU.
Citation
If you find PP-DocLayoutV3 helpful, feel free to give the original project a star and citation.
@misc{cui2026paddleocrvl15multitask09bvlm,
title={PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing},
author={Cheng Cui and Ting Sun and Suyin Liang and Tingquan Gao and Zelun Zhang and Jiaxuan Liu and Xueqing Wang and Changda Zhou and Hongen Liu and Manhui Lin and Yue Zhang and Yubo Zhang and Yi Liu and Dianhai Yu and Yanjun Ma},
year={2026},
eprint={2601.21957},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.21957},
}
- Downloads last month
- 15
Model tree for Fiona1019/PP-DocLayoutV3-ov
Base model
PaddlePaddle/PP-DocLayoutV3