An open source project to build a data set for Language Modeling with a capacity of at least 1TB comprised of diverse texts in Polish.