sapienzanlp
/

modello-italia-9b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

s-conia commited on Jun 7

Commit

6877064

•

1 Parent(s): 52296a8

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -37,8 +37,10 @@ For more information about this issue, please refer to our survey paper:
 ## Training dataset
 The following information is based on the information we could gather, that is, it is NOT official.
 Please take it with a pinch of salt as we continue to study Modello Italia.
 * Modello Italia is probably trained on around 1T tokens of Italian text;
-* **The training data of Modello Italia is unknown.**
 ## Tokenizer
 The following information is based on the information we could gather, that is, it is NOT official.

 ## Training dataset
 The following information is based on the information we could gather, that is, it is NOT official.
 Please take it with a pinch of salt as we continue to study Modello Italia.
+* **The training data of Modello Italia is unknown;**
 * Modello Italia is probably trained on around 1T tokens of Italian text;
+* We know that the training data is mostly Italian text and source code;
+* We know that the training data includes text from Editoria Nazionale.
 ## Tokenizer
 The following information is based on the information we could gather, that is, it is NOT official.