Sebastian Gabarain

Locutusque

AI & ML interests

Pushing performance in small language models

Organizations

Posts 7

view post
Post
657
I created a Twitter account a while back. I finally decided to make it public SebastianG74019. For those of you following @Locutusque on Twitter, that is not me! 😂
view post
Post
2485
Exciting news! 🎉 I've created the OpenCerebrum datasets, open-source alternatives to Aether Research's proprietary Cerebrum dataset.

The first, OpenCerebrum SFT, is a text-generation and question-answering dataset with ~1.2M examples, curated from sources like Open-Orca, glaiveai, camel-ai, and more! 📚

The second, OpenCerebrum DPO, is a smaller dataset with ~21k examples, focusing on data point optimization. It's curated from sources like jondurbin, argilla, grimulkan, and others. 📊

Both datasets are licensed under Apache-2.0 and are available in English. They're ready for use in your projects, and I welcome any feedback for future improvements! 🚀

Locutusque/OpenCerebrum-dpo
Locutusque/OpenCerebrum-SFT
Locutusque/OpenCerebrum-1.0-7b-SFT
Locutusque/OpenCerebrum-1.0-7b-DPO