Datasets documentation

Datasets

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v2.18.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Datasets

🤗 Datasets is a library for easily accessing and sharing datasets for Audio, Computer Vision, and Natural Language Processing (NLP) tasks.

Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. We also feature a deep integration with the Hugging Face Hub, allowing you to easily load and share a dataset with the wider machine learning community.

Find your dataset today on the Hugging Face Hub, and take an in-depth look inside of it with the live viewer.