OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 27 items • Updated Nov 6 • 121
view article Article Releasing Common Corpus: the largest public domain dataset for training LLMs By Pclanglais • Mar 20 • 18
Saul-7B: A pioneering Large Language Model for Law Collection We introduce SaulLM-7B, a LLM tailored for the legal domain trained on 30 billion tokens of legal data. Released under MIT License. • 4 items • Updated Mar 7 • 18