Any Hugging Face user can create a dataset! You can start by creating your dataset repository and choosing one of the following methods to upload your dataset:
- Add files manually to the repository through the UI
- Push files with the
push_to_hubmethod from 🤗 Datasets
- Use Git to commit and push your dataset files
While it’s possible to add raw data to your dataset repo in a number of formats (JSON, CSV, Parquet, text, and images), for large datasets you may want to create a loading script. This script defines the different configurations and splits of your dataset, as well as how to download and process the data.
Datasets outside a namespace are maintained by the Hugging Face team on GitHub. Unlike the naming convention used for community datasets (
org/dataset_name), datasets outside a namespace can be referenced directly by their name (e.g.
glue). If you find that an improvement is needed, refer to the 🤗 Datasets documentation for an explanation on how to submit a PR on GitHub to propose edits.