Hiro-Layout: Document Layout Analysis for Patent and Technical PDFs

English | 简体中文

Hiro-Layout is a document layout analysis model for patent and technical PDF pages. It detects and classifies page regions such as text, titles, headers, footers, tables, formulas, chemical structures, figures, captions, search reports, bibliographies, and other patent-specific layout elements.

Highlights

  • Patent-focused layout understanding: covers common patent PDF regions and patent-specific structures.
  • Technical document coverage: evaluated on both patent PDFs and NPD PDFs.
  • Fine-grained taxonomy: 25 layout categories across figure, text, and complex document elements.

Model Overview

Item Details
Model name Hiro-Layout
Current artifact layout_model/RT-DETR_25.onnx
Task Document layout analysis / page region detection
Input Rendered PDF page image
Output Layout regions with class labels
Domains Patent PDFs, technical/NPD PDFs
License Apache-2.0

Layout Taxonomy

Group Class Abbr. Chinese
figure graph graph 图表
figure drawing draw 绘制图
figure structure diagram struc 结构图
figure photograph photo 照片
figure table tab 表格
figure math equation eqn 数学公式
figure chemical formula chem 化学式
figure noise noise 噪声
text text text 文本
text title title 标题
text section title sec 章节标题
text page header head 页眉
text page footer foot 页脚
text marginal note mnote 边注
text caption cap 说明
text figure number figno 编号
text line number lineno 行号
text column number colno 栏号
text sequence seq 序列表
complex figure complex figcx 图片组
complex chemical reaction rxn 反应式
complex bibliography bib 著录页
complex search report srep 搜索报告
complex Table of Contents toc 目录
complex reference ref 参考文献

Benchmarks

Metrics are reported as Precision, Recall, and F1.

Benchmark Labels Precision Recall F1
Patent PDF 33,054 0.8144 0.7711 0.7922
NPD PDF 17,769 0.7090 0.6983 0.7036

Patent PDF

# Group Abbr. Class Chinese Labels Precision Recall F1
1 figure graph graph 图表 215 0.7611 0.8000 0.7800
2 figure draw drawing 绘制图 420 0.8649 0.3048 0.4507
3 figure struc structure diagram 结构图 626 0.6579 0.8355 0.7361
4 figure photo photograph 照片 147 0.8378 0.8435 0.8407
5 figure tab table 表格 198 0.7759 0.9091 0.8372
6 figure eqn math equation 数学公式 399 0.7762 0.6692 0.7187
7 figure chem chemical formula 化学式 1,099 0.8792 0.8944 0.8868
8 figure noise noise 噪声 1,241 0.7025 0.7687 0.7341
9 text text text 文本 17,668 0.8182 0.8062 0.8122
10 text title title 标题 601 0.9117 0.8070 0.8561
11 text sec section title 章节标题 1,394 0.7968 0.7088 0.7502
12 text head page header 页眉 3,074 0.8187 0.7788 0.7983
13 text foot page footer 页脚 1,012 0.7432 0.6433 0.6896
14 text mnote marginal note 边注 421 0.7794 0.5202 0.6239
15 text cap caption 说明 80 0.6842 0.4875 0.5693
16 text figno figure number 编号 1,389 0.8955 0.7466 0.8143
17 text lineno line number 行号 341 0.7759 0.6598 0.7132
18 text colno column number 栏号 449 0.6964 0.4699 0.5612
19 text seq sequence 序列表 136 0.4430 0.2574 0.3256
20 complex figcx figure complex 图片组 1,416 0.8657 0.7373 0.7963
21 complex rxn chemical reaction 反应式 150 0.8898 0.7000 0.7836
22 complex bib bibliography 著录页 470 0.9615 0.7979 0.8721
23 complex srep search report 搜索报告 106 0.9052 0.9906 0.9459
24 complex toc Table of Contents 目录 0 0.0000 0.0000 0.0000
25 complex ref reference 参考文献 2 0.0000 0.0000 0.0000
ALL 33,054 0.8144 0.7711 0.7922

NPD PDF

# Group Abbr. Class Chinese Labels Precision Recall F1
1 figure graph graph 图表 248 0.6838 0.6976 0.6906
2 figure draw drawing 绘制图 9 0.0000 0.0000 0.0000
3 figure struc structure diagram 结构图 341 0.7454 0.7126 0.7286
4 figure photo photograph 照片 82 0.6071 0.6220 0.6145
5 figure tab table 表格 209 0.7533 0.8182 0.7844
6 figure eqn math equation 数学公式 298 0.6789 0.5604 0.6140
7 figure chem chemical formula 化学式 388 0.7324 0.8325 0.7793
8 figure noise noise 噪声 695 0.4823 0.4302 0.4548
9 text text text 文本 9,119 0.6943 0.7625 0.7268
10 text title title 标题 304 0.7130 0.5395 0.6142
11 text sec section title 章节标题 1,539 0.7337 0.6160 0.6697
12 text head page header 页眉 1,246 0.7464 0.7111 0.7283
13 text foot page footer 页脚 1,339 0.7711 0.6468 0.7035
14 text mnote marginal note 边注 190 0.5714 0.2947 0.3889
15 text cap caption 说明 573 0.8711 0.5899 0.7034
16 text figno figure number 编号 149 0.6078 0.4161 0.4940
17 text lineno line number 行号 41 0.6667 0.9268 0.7755
18 text colno column number 栏号 0 0.0000 0.0000 0.0000
19 text seq sequence 序列表 18 0.7000 0.3889 0.5000
20 complex figcx figure complex 图片组 734 0.7657 0.7480 0.7567
21 complex rxn chemical reaction 反应式 36 0.8947 0.4722 0.6182
22 complex bib bibliography 著录页 0 0.0000 0.0000 0.0000
23 complex srep search report 搜索报告 3 0.4286 1.0000 0.6000
24 complex toc Table of Contents 目录 76 0.8475 0.6579 0.7407
25 complex ref reference 参考文献 132 0.8148 0.3333 0.4731
ALL 17,769 0.7090 0.6983 0.7036

Usage

The current model artifact is an ONNX export:

layout_model/RT-DETR_25.onnx

The model can be loaded with ONNXRuntime:

import onnxruntime as ort

session = ort.InferenceSession("layout_model/RT-DETR_25.onnx")
print("inputs:", [i.name for i in session.get_inputs()])
print("outputs:", [o.name for o in session.get_outputs()])

Use labels.json for the 25-class label mapping.

Repository Files

File Purpose
README.md Hugging Face model card in English
README_zh.md Chinese model card
EVALUATION.md Detailed benchmark results derived from the workbook
labels.json Machine-readable 25-class label mapping
layout_model/RT-DETR_25.onnx ONNX model artifact
requirements.txt Minimal dependencies for ONNX loading and image preprocessing
LICENSE Apache-2.0 license
DISCLAIMER.md Model limitations and responsible-use notes
NOTICE Copyright and trademark notice
OPEN_SOURCE_CHECKLIST.md Release checklist before public upload

Limitations

  • Layout predictions may be inaccurate on low-resolution scans, heavily rotated pages, handwritten documents, unusual patent formats, or unseen page templates.
  • Small objects and sparse categories can have unstable metrics when the evaluation set has very few labels.
  • The model should not be used as the sole source of truth for legal, compliance, filing, archival, or customer-facing workflows without human review.
  • Users are responsible for ensuring they have the right to process and share any documents used with this model.

License

This project is released under the Apache License 2.0. See LICENSE.

Copyright Notice

Copyright (c) 2026 Patsnap. All rights reserved except as expressly licensed under the applicable license terms.

Hiro-Layout, Hiro, Patsnap, and any associated names, logos, product names, service names, designs, and slogans are trademarks or registered trademarks of Patsnap or its affiliates. No trademark license is granted under the open source license or any model license unless expressly stated.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support