hantian commited on
Commit
dfabea3
1 Parent(s): b78dd24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -2
README.md CHANGED
@@ -8,6 +8,56 @@ tags:
8
  - document-analysis
9
  ---
10
 
11
- yolo-doclaynet
12
 
13
- https://github.com/ppaanngggg/yolo-doclaynet
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - document-analysis
9
  ---
10
 
11
+ **More details refer to [Github](https://github.com/ppaanngggg/yolo-doclaynet)**
12
 
13
+ ## Introduction
14
+
15
+ You know that RAG is very popular these days. There are many applications that support talking to documents. However,
16
+ there is a huge performance drop when talking to a complex document due to the complex structures. So it's a challenge
17
+ to extract content from complex document and organize it into parsable form. This repo aims to solve this challenge with
18
+ a fast and good performance method.
19
+
20
+ ## Detection Sample
21
+
22
+ ![image](https://github.com/ppaanngggg/yolo-doclaynet/raw/main/annotated-test.png)
23
+
24
+ ## Method
25
+
26
+ 1. `YOLO` is the most advenced detect model developed by [Ultralytics](https://github.com/ultralytics/ultralytics). YOLO
27
+ has 5 different sizes of base model and a super powerful framework for training and deployment. So I chose YOLO to
28
+ solve this challenge.
29
+ 2. `DocLayNet` is a human-annotated document layout segmentation dataset containing 80863 pages from a broad variety of
30
+ document sources. As far as I know, it's the most qualified document layout analysis dataset.
31
+
32
+ ## Usage
33
+
34
+ ```python
35
+ from ultralytics import YOLO
36
+
37
+ model = YOLO("{path to model file}")
38
+ pred = model("{path to test image}")
39
+ print(pred)
40
+ ```
41
+
42
+ ## Dataset
43
+
44
+ DocLayNet can be found more details and download at this [link](https://github.com/DS4SD/DocLayNet). It has 11 labels:
45
+
46
+ - **Text**: Regular paragraphs.
47
+ - **Picture**: A graphic or photograph.
48
+ - **Caption**: Special text outside a picture or table that introduces this picture or
49
+ table.
50
+ - **Section-header**: Any kind of heading in the text, except overall document title.
51
+ - **Footnote**: Typically small text at the bottom of a page, with a number or symbol
52
+ that is referred to in the text above.
53
+ - **Formula**: Mathematical equation on its own line.
54
+ - **Table**: Material arranged in a grid alignment with rows and columns, often
55
+ with separator lines.
56
+ - **List-item**: One element of a list, in a hanging shape, i.e., from the second line
57
+ onwards the paragraph is indented more than the first line.
58
+ - **Page-header**: Repeating elements like page number at the top, outside of the
59
+ normal text flow.
60
+ - **Page-footer**: Repeating elements like page number at the bottom, outside of the
61
+ normal text flow.
62
+ - **Title**: Overall title of a document, (almost) exclusively on the first page and
63
+ typically appearing in large font.