Introducing CosmoChat, a multiturn chat dataset based on Cosmopedia that I'm working on in the open on the Hub.
🎯 Goals: 💬 Create multi-turn chats seeded from Cosmopedia 🎓 Customize questions for different audience levels 🔍 Evaluate the model's ability to elaborate and clarify 🤓 (I want to learn more about creating valuable synthetic datasets, and I learn best by doing stuff rather than reading stuff).
Cosmochat is created using the excellent distilabel library.