File size: 994 Bytes
230c9a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# PDF2Markdown

**Demo:(left: input image; right: rendered markdown.)**

![demo](demo.png)


1. Extract PDF features by these tasks:

    - Layout Detection: Using the YOLOv8 model for region detection, such as images, tables, titles, text, etc.;

    - Formula Detection: Using YOLOv8 for detecting formulas, including inline formulas and isolated formulas;

    - Formula Recognition: Using UniMERNet for formula recognition;

    - Table Recognition: Using StructEqTable for table recognition;

    - Optical Character Recognition: Using PaddleOCR for text recognition;

2. Convert features to markdown file:

    Using simple rules to convert the identified result to markdown (*Note: this is a simply convert code and can only support one-column PDFs, see [MinerU](https://github.com/opendatalab/MinerU) for more complex situation*).



# Usage

```

python project/pdf2markdown/scripts/run_project.py --config project/pdf2markdown/configs/pdf2markdown.yaml

```