Spaces:

fondant-ai
/

README

Running

File size: 2,777 Bytes

---
title: README
emoji: 🍫
colorFrom: yellow
colorTo: green
sdk: static
pinned: false
---



<p align="center">
    <img src="https://raw.githubusercontent.com/ml6team/fondant/main/docs/art/fondant_banner.svg" alt="Fondant banner" height="200">
    <i>Large-scale data processing made easy and reusable</i>
    <br>
    <a href="https://fondant.readthedocs.io/en/stable/"><strong>Explore the docs »</strong></a>
</p>


<p float="left" align="middle">
  <a href="https://discord.gg/HnTdWhydGp"><img src="https://img.shields.io/badge/Discord-%235865F2.svg?style=for-the-badge&logo=discord&logoColor=white" alt="Discord badge" width="100"></a> <a href="https://www.github.com/ml6team/fondant"><img src="https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white" alt="Github badge" width="100"></a>
</p>


---
🍫 **Fondant is an open-source framework that aims to simplify and speed up large-scale data processing by making 
containerized components reusable across pipelines and execution environments and shareable within the community.**

It offers:
- 🔧 Plug ‘n’ play composable pipelines for creating datasets for
    - AI image generation model fine-tuning (Stable Diffusion, ControlNet)
    - Large language model fine-tuning (LLaMA, Falcon)
    - Code generation model fine-tuning (StarCoder)
- 🧱 Library of off-the-shelf reusable components for
    - Extracting data from public sources such as Common Crawl, LAION, ...
    - Filtering on 
        - Content, e.g. language, visual style, topic, format, aesthetics, etc.
        - Context, e.g. copyright license, origin
        - Metadata
    - Removal of unwanted data such as toxic, NSFW or generated content
    - Removal of unwanted data patterns such as societal bias
    - Transforming data (resizing, cropping, reformatting, …)
    - Tuning the data for model performance (normalization, deduplication, …)
    - Enriching data (captioning, metadata generation, synthetics, …)
    - Transparency, auditability, compliance
- 📖 🖼️ 🎞️ ♾️ Out of the box multimodal capabilities: text, images, video, etc.
- 🐍 Standardized, Python/Pandas-based way of creating custom components
- 🏭 Production-ready, scalable deployment
- ☁️ Multi-cloud integrations

## 🪤 Why Fondant?

In the age of Foundation Models, control over your data is key and building pipelines
for large-scale data processing is costly, especially when they require advanced
machine learning-based operations. This need not be the case, however, if processing
components would be reusable and exchangeable and pipelines were easily composable. 
Realizing this is the main vision behind Fondant.

<p align="right">(<a href="#chocolate_bar-fondant">back to top</a>)</p>