File size: 752 Bytes
a1b03c4 91c98bb 899a920 48a14f0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
---
license: openrail
---
<h3 align="center">PDF Paragraphs Extraction</h3>
<p align="center">A model for extracting paragraphs from PDFs</p>
This model uses features from the PDF to extract the text and paragraphs from it. It can be used as a service.
The paragraphs contain the page number, the position in the page, the size, and the text.
## Quick Start
Download the service that uses the model:
git clone https://github.com/huridocs/pdf_paragraphs_extraction.git
cd pdf_paragraphs_extraction
Start the service:
./run start
Get the paragraphs from a PDF:
curl -X GET -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5051
To stop the server:
./run stop
## Performance
Accuracy: 93.9%
Speed: 0.15 seconds per page |