Datasets documentation

Create a dataset card

You are viewing v2.21.0 version. A newer version v3.1.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Create a dataset card

Each dataset should have a dataset card to promote responsible usage and inform users of any potential biases within the dataset. This idea was inspired by the Model Cards proposed by Mitchell, 2018. Dataset cards help users understand a dataset’s contents, the context for using the dataset, how it was created, and any other considerations a user should be aware of.

Creating a dataset card is easy and can be done in just a few steps:

  1. Go to your dataset repository on the Hub and click on Create Dataset Card to create a new README.md file in your repository.

  2. Use the Metadata UI to select the tags that describe your dataset. You can add a license, language, pretty_name, the task_categories, size_categories, and any other tags that you think are relevant. These tags help users discover and find your dataset on the Hub.

For a complete, but not required, set of tag options you can also look at the Dataset Card specifications. This’ll have a few more tag options like multilinguality and language_creators which are useful but not absolutely necessary.

  1. Click on the Import dataset card template link to automatically create a template with all the relevant fields to complete. Fill out the template sections to the best of your ability. Take a look at the Dataset Card Creation Guide for more detailed information about what to include in each section of the card. For fields you are unable to complete, you can write [More Information Needed].

  2. Once you’re done, commit the changes to the README.md file and you’ll see the completed dataset card on your repository.

YAML also allows you to customize the way your dataset is loaded by defining splits and/or configurations without the need to write any code.

Feel free to take a look at the SNLI, CNN/DailyMail, and Allociné dataset cards as examples to help you get started.

< > Update on GitHub