Datasets based on UltraFeedback
This collection contains some datasets created on top of UltraFeedback using Argilla for the dataset exploration and curation, sorted by release date.
Viewer • Updated • 2.82k • 57Note Curated dataset on top of `openbmb/UltraFeedback` applying binarization to generate a dataset suitable for DPO fine-tuning inspired by HuggingFace H4 previous efforts, but applying a completely different approach on data binarization based on the mean of the preference ratings instead of in the overall score of the critique. Additionally, an extensive data exploration and curation with Argilla was applied, to identify some potential issues within the original dataset
argilla/ultrafeedback-binarized-preferences-cleaned
Viewer • Updated • 5.72k • 96Note Iteration on top of `argilla/ultrafeedback-binarized-preferences` but removing the TruthfulQA prompts that were introducing some data contamination as spotted by AllenAI, but keeping Argilla's approach on the data binarization Formatting: the dataset follows the same formatting as the one defined within the Alignment Handbook from HuggingFace H4
argilla/ultrafeedback-multi-binarized-preferences-cleaned
Viewer • Updated • 5 • 5Note Built on top of `openbmb/UltraFeedback` following the same approach as for `argilla/ultrafeedback-binarized-preferences-cleaned` but keeping all the rejected samples so that we end up ~ 3 times more examples to use during fine-tuning for DPO Formatting: the dataset follows the same formatting as the one defined within the Alignment Handbook from HuggingFace H4
argilla/ultrafeedback-multi-binarized-quality-preferences-cleaned
Viewer • Updated • 7 • 4Note A simpler iteration on top of `argilla/ultrafeedback-multi-binarized-preferences-cleaned` but removing the low quality samples i.e. mean preference rating for the chosen pairs lower than 3.0 Formatting: the dataset follows the same formatting as the one defined within the Alignment Handbook from HuggingFace H4
argilla/ultrafeedback-curated
Viewer • Updated • 6 • 18Note Another iteration on top of `openbmb/UltraFeedback` aiming to solve the issue with the critiques with an overall score of 10.0, coming from a bug within the UltraFeedback code, where the 1.0 ratings were computed as 10.0. Using `distilabel` with GPT-4 we've generated the critique and ratings again for those with score 10.0 and corrected the ones with score 1.0 Formatting: the dataset follows the same formatting as `openbmb/UltraFeedback` with an additional column to identify the updated rows
argilla/ultrafeedback-binarized-preferences-cleaned-kto
Viewer • Updated • 3 • 4