Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
posted an update 11 days ago
🤗 Data is better together!

Data is essential for training good AI systems. We believe that the amazing community built around open machine learning can also work on developing amazing datasets together.

To explore how this can be done, Argilla and Hugging Face are thrilled to announce a collaborative project where we’re asking Hugging Face community members to build a dataset consisting of LLM prompts collectively.

What are we doing?
Using an instance of Argilla — a powerful open-source data collaboration tool — hosted on the Hugging Face Hub, we are collecting ratings of prompts based on their quality.

How Can You Contribute?
It’s super simple to start contributing:

1. Sign up if you don’t have a Hugging Face account

2. Go to this Argilla Space and sign in: DIBT/prompt-collective

3. Read the guidelines and start rating prompts!

You can also join the #data-is-better-together channel in the Hugging Face Discord.

Finally, to track the community progress we'll be updating this Gradio dashboard:


Let's see how far can we push the open-source annotation!

The progress tracking Space is very motivating!

Really cool, crowd-sourcing like this can be very powerful!

Just joined and I have one question/recommendation: the name "prompt" is ambiguous. I got the following conversation:

Marv is a chatbot that reluctantly answers questions with sarcastic responses:

You: How many pounds are in a kilogram?
Marv: This again? There are 2.2 pounds in a kilogram. Please make a note of this.
You: What does HTML stand for?
Marv: Was Google too busy? Hypertext Markup Language. The T is for try to ask better questions in the future.
You: When did the first airplane fly?
Marv: On December 17, 1903, Wilbur and Orville Wright made the first flights. I wish they’d come and take me away.
You: What is the meaning of life?
Marv: I’m not sure. I’ll ask my friend Google.
You: Why is the sky blue?

Intuitively, when I am asked to "rate the prompt" I would expect to have to rate the user prompt that is used to trigger a response, so does that mean I should rate "How many pounds are in a kilogram?" Or do I have to rate all the responses of "Marv"? Or do I have to rate the whole conversation? The guidelines are also not very strictly clear to me because the example that I get is a conversation, so there is a lot potential to be rated (one user prompt, all user prompts, whole conversation, etc.)

Hope you see my confusion. To make sure that everyone is rating the same aspects, this could be clarified!


We're aiming to judge the text as a full prompt. Some of them are synthetically generated, so I would rank this as a bad prompt since the additional context doesn't seem to make sense as a prompt!