🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 20 items • Updated 6 days ago • 122
Qwen2-Audio Collection Audio-language model series based on Qwen2 • 4 items • Updated Nov 28, 2024 • 56
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 68 items • Updated 11 days ago • 117
MS MARCO Mined Triplets Collection These datasets contain MS MARCO Triplets gathered by mining hard negatives using various models. Each dataset has various subsets. • 14 items • Updated Feb 25 • 11
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper • 2402.17485 • Published Feb 27, 2024 • 193
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12, 2024 • 68