Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Adding new datasets

Any Hugging Face user can create a dataset! You can start by creating your dataset repository and choosing one of the following methods to upload your dataset:

While in many cases it’s possible to just add raw data to your dataset repo in any supported formats (JSON, CSV, Parquet, text, images, audio files, …), for some large datasets you may want to create a loading script. This script defines the different configurations and splits of your dataset, as well as how to download and process the data.

Datasets outside a namespace

Datasets outside a namespace are maintained by the Hugging Face team. Unlike the naming convention used for community datasets (username/dataset_name or org/dataset_name), datasets outside a namespace can be referenced directly by their name (e.g. glue). If you find that an improvement is needed, use their “Community” tab to open a discussion or submit a PR on the Hub to propose edits.