README / README.md
laverdes's picture
fix: link for unstructured website
11e919f
metadata
title: README
emoji: 💠
colorFrom: yellow
colorTo: indigo
sdk: streamlit
pinned: false

Welcome to our space! 🎊

The Unstructured.io Team provides libraries with open-source components for pre-processing text documents such as PDFs, HTML and Word Documents. These components are packaged as bricks 🧱, which provide users the building blocks they need to build pipelines targeted at the documents they care about. Bricks in the library fall into three categories:

  • 🧩 Partitioning bricks that break raw documents down into standard, structured elements.
  • 🧹 Cleaning bricks that remove unwanted text from documents, such as boilerplate and sentence fragments.
  • 🎭 Staging bricks that format data for downstream tasks, such as ML inference and data labeling.

In this space we explore different settings of deep-learning models fine-tuned with several datasets containing a specific document type and corresponding annotations.

Main GitHub repository link: here