Libraries
The Datasets Hub has support for several libraries in the Open Source ecosystem. Thanks to the huggingface_hub Python library, it’s easy to enable sharing your datasets on the Hub. We’re happy to welcome to the Hub a set of Open Source libraries that are pushing Machine Learning forward.
The table below summarizes the supported libraries and their level of integration.
Library | Description | Download from Hub | Push to Hub |
---|---|---|---|
Dask | Parallel and distributed computing library that scales the existing Python and PyData ecosystem. | ✅ | ✅ |
Datasets | 🤗 Datasets is a library for accessing and sharing datasets for Audio, Computer Vision, and Natural Language Processing (NLP). | ✅ | ✅ |
DuckDB | In-process SQL OLAP database management system. | ✅ | ✅ |
Pandas | Python data analysis toolkit. | ✅ | ✅ |
WebDataset | Library to write I/O pipelines for large datasets. | ✅ | ❌ |