The Hugging Face Hub hosts a large number of community-curated datasets for a diverse range of tasks such as translation, automatic speech recognition, and image classification. Alongside the information contained in the dataset card, many datasets, such as GLUE, include a Dataset Preview to showcase the data.
Each dataset is a Git repository, equipped with the necessary scripts to download the data and generate splits for training, evaluation, and testing. For information on how a dataset repository is structured, refer to the Structure your repository guide. Following the supported repo structure will ensure that your repository will have a preview on its dataset page on the Hub.
Like models and Spaces, you can search the Hub for datasets using the search bar in the top navigation or on the main datasets page. There’s a large number of languages, tasks, and licenses that you can use to filter your results to find a dataset that’s right for you.
Since datasets are repositories, you can toggle their visibility between private and public through the Settings tab. If a dataset is owned by an organization, the privacy settings apply to all the members of the organization.