File size: 752 Bytes
a1b03c4
 
 
91c98bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
899a920
48a14f0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
license: openrail
---


<h3 align="center">PDF Paragraphs Extraction</h3>
<p align="center">A model for extracting paragraphs from PDFs</p>

This model uses features from the PDF to extract the text and paragraphs from it. It can be used as a service. 

The paragraphs contain the page number, the position in the page, the size, and the text.


## Quick Start

Download the service that uses the model:

    git clone https://github.com/huridocs/pdf_paragraphs_extraction.git
    cd pdf_paragraphs_extraction

Start the service:

    ./run start

Get the paragraphs from a PDF:

    curl -X GET -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5051

To stop the server:

    ./run stop


## Performance

Accuracy: 93.9%

Speed: 0.15 seconds per page