HURIDOCS
/

pdf-segmentation

Model card Files Files and versions Community

pdf-segmentation / README.md

gabriel-p's picture

Update README.md

899a920 almost 2 years ago

|

752 Bytes

	---
	license: openrail
	---


	<h3 align="center">PDF Paragraphs Extraction</h3>
	<p align="center">A model for extracting paragraphs from PDFs</p>

	This model uses features from the PDF to extract the text and paragraphs from it. It can be used as a service.

	The paragraphs contain the page number, the position in the page, the size, and the text.


	## Quick Start

	Download the service that uses the model:

	git clone https://github.com/huridocs/pdf_paragraphs_extraction.git
	cd pdf_paragraphs_extraction

	Start the service:

	./run start

	Get the paragraphs from a PDF:

	curl -X GET -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5051

	To stop the server:

	./run stop


	## Performance

	Accuracy: 93.9%

	Speed: 0.15 seconds per page