s-conia commited on
Commit
c4bac46
1 Parent(s): e026fa4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -37,8 +37,10 @@ For more information about this issue, please refer to our survey paper:
37
  ## Training dataset
38
  The following information is based on the information we could gather, that is, it is NOT official.
39
  Please take it with a pinch of salt as we continue to study Modello Italia.
 
40
  * Modello Italia is probably trained on around 1T tokens of Italian text;
41
- * **The training data of Modello Italia is unknown.**
 
42
 
43
  ## Tokenizer
44
  The following information is based on the information we could gather, that is, it is NOT official.
 
37
  ## Training dataset
38
  The following information is based on the information we could gather, that is, it is NOT official.
39
  Please take it with a pinch of salt as we continue to study Modello Italia.
40
+ * **The training data of Modello Italia is unknown;**
41
  * Modello Italia is probably trained on around 1T tokens of Italian text;
42
+ * We know that the training data is mostly Italian text and source code;
43
+ * We know that the training data includes text from Editoria Nazionale.
44
 
45
  ## Tokenizer
46
  The following information is based on the information we could gather, that is, it is NOT official.