OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 27 items • Updated 15 days ago • 117
view article Article Releasing Common Corpus: the largest public domain dataset for training LLMs By Pclanglais • Mar 20 • 17
Saul-7B: A pioneering Large Language Model for Law Collection We introduce SaulLM-7B, a LLM tailored for the legal domain trained on 30 billion tokens of legal data. Released under MIT License. • 4 items • Updated Mar 7 • 18