unstructured.io

company

Verified

http://www.unstructured.io

Unstructured-IO

Activity Feed Request to join this org

AI & ML interests

ETL for LLMs

Recent Activity

cragwolfe updated a model 17 days ago

unstructuredio/yolo_x_layout

laverdes updated a model about 1 year ago

unstructuredio/donut-base-labelstudio-A1.0

ajimeno updated a Space over 1 year ago

unstructuredio/unstructured-chipper-app

View all activity

Organization Card

Community About org cards

Welcome to our space! 🎊

The Unstructured.io Team provides libraries with open-source components for pre-processing text documents such as PDFs, HTML and Word Documents. These components are packaged as bricks 🧱, which provide users the building blocks they need to build pipelines targeted at the documents they care about. Bricks in the library fall into three categories:

🧩 Partitioning bricks that break raw documents down into standard, structured elements.
🧹 Cleaning bricks that remove unwanted text from documents, such as boilerplate and sentence fragments.
🎭 Staging bricks that format data for downstream tasks, such as ML inference and data labeling.

In this space we explore different settings of deep-learning models fine-tuned with several datasets containing a specific document type and corresponding annotations.

Main GitHub repository link: here

spaces 6

Unstructured Chipper App

Extract structured data from documents

Unstructured Chipper App

Parse and extract information from documents

Irs Manuals

Ask questions about IRS Manuals

Receipt Parser

Chat Your Data ISW

Ask questions about Ukraine's conflict

Invoices Parser

models 7

unstructuredio/yolo_x_layout

Updated 17 days ago • 15

unstructuredio/donut-base-labelstudio-A1.0

Image-Text-to-Text • Updated Apr 2, 2024 • 3 • 6

unstructuredio/detectron2_mask_rcnn_X_101_32x8d_FPN_3x

Updated Jul 12, 2023

unstructuredio/donut-invoices

Image-Text-to-Text • Updated May 23, 2023 • 19 • 3

unstructuredio/detectron2_faster_rcnn_R_50_FPN_3x

Updated May 9, 2023 • 2

unstructuredio/oer-checkbox

Updated Dec 22, 2022

unstructuredio/donut-base-sroie

Image-Text-to-Text • Updated Dec 1, 2022 • 39 • 1

datasets

None public yet