Submitted by Stefan Schweter 7 The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models CORAL NLP Research 5 2