Hugging Face announces Cosmo 1B, a fully open sourced Phi competitor with an open sourced dataset. The dataset references various articles and textbooks as "seed data" to generate conversations. Licensed under the Apache 2.0 license. The dataset, dubbed "Cosmopedia," is published on the Hugging Face Hub under the Apache 2.0 license. It was generated using Mixtral 8x7B with various sources (AutoMathText, OpenStax, WikiHow, etc) as "seed data."

Model: HuggingFaceTB/cosmo-1b
Dataset: HuggingFaceTB/cosmopedia
Unofficial demo I created: mrfakename/cosmo-1b-gpu-demo