Adding new datasets

Any Hugging Face user can create a dataset! You can start by creating your dataset repository and choosing one of the following methods to upload your dataset:

While it’s possible to add raw data to your dataset repo in a number of formats (JSON, CSV, Parquet, text, and images), for large datasets you may want to create a loading script. This script defines the different configurations and splits of your dataset, as well as how to download and process the data.

Datasets outside a namespace

Datasets outside a namespace are maintained by the Hugging Face team on GitHub. Unlike the naming convention used for community datasets (username/dataset_name or org/dataset_name), datasets outside a namespace can be referenced directly by their name (e.g. glue). If you find that an improvement is needed, refer to the 🤗 Datasets documentation for an explanation on how to submit a PR on GitHub to propose edits.