Indic Datasets List of text and voice datasets to train and finetune Indic LLMs ai4bharat/sangraha Viewer • Updated about 1 month ago • 177M • 665 • 25 uonlp/CulturaX Viewer • Updated Jul 23 • 7.18B • 2.2k • 446 pary/hind_encorp Updated Jan 18 • 16 • 1 PleIAs/YouTube-Commons Updated Jun 26 • 7 • 293
Alignment Dataset English and other model alignment datasets. H-D-T/Buzz-8b-Large-v0.5 Text Generation • Updated May 14 • 15 • 29 allenai/WildChat-1M Viewer • Updated 9 days ago • 838k • 1.39k • 256 nvidia/ChatQA-Training-Data Viewer • Updated Jun 4 • 442k • 1.87k • 147 nvidia/ChatRAG-Bench Viewer • Updated May 24 • 34.6k • 1.29k • 91