README / README.md
laverdes's picture
fix: link for unstructured website
11e919f
---
title: README
emoji: πŸ’ 
colorFrom: yellow
colorTo: indigo
sdk: streamlit
pinned: false
---
Welcome to our space! 🎊
The [Unstructured.io](https://www.unstructured.io/) Team provides libraries with open-source components for pre-processing text documents
such as **PDFs**, **HTML** and **Word** Documents. These components are packaged as *bricks* 🧱, which provide
users the building blocks they need to build pipelines targeted at the documents they care
about. Bricks in the library fall into three categories:
- 🧩 ***Partitioning bricks*** that break raw documents down into standard, structured
elements.
- 🧹 ***Cleaning bricks*** that remove unwanted text from documents, such as boilerplate and
sentence
fragments.
- 🎭 ***Staging bricks*** that format data for downstream tasks, such as ML inference
and data labeling.
In this space we explore different settings of deep-learning models fine-tuned with several datasets containing a
specific document type and corresponding annotations.
Main GitHub repository link: [here](https://github.com/Unstructured-IO/unstructured)