Releasing Youtube-Commons: a massive open corpus for conversational and multimodal data 23 days ago • 20
Common Corpus Collection The largest public domain dataset for training LLMs. • 26 items • Updated Mar 20 • 99