--- title: README emoji: ๐Ÿ’  colorFrom: yellow colorTo: indigo sdk: streamlit pinned: false --- Welcome to our space! ๐ŸŽŠ The [Unstructured.io](https://www.unstructured.io/) Team provides libraries with open-source components for pre-processing text documents such as **PDFs**, **HTML** and **Word** Documents. These components are packaged as *bricks* ๐Ÿงฑ, which provide users the building blocks they need to build pipelines targeted at the documents they care about. Bricks in the library fall into three categories: - ๐Ÿงฉ ***Partitioning bricks*** that break raw documents down into standard, structured elements. - ๐Ÿงน ***Cleaning bricks*** that remove unwanted text from documents, such as boilerplate and sentence fragments. - ๐ŸŽญ ***Staging bricks*** that format data for downstream tasks, such as ML inference and data labeling. In this space we explore different settings of deep-learning models fine-tuned with several datasets containing a specific document type and corresponding annotations. Main GitHub repository link: [here](https://github.com/Unstructured-IO/unstructured)