Update README.md
Browse files
README.md
CHANGED
@@ -37,8 +37,10 @@ For more information about this issue, please refer to our survey paper:
|
|
37 |
## Training dataset
|
38 |
The following information is based on the information we could gather, that is, it is NOT official.
|
39 |
Please take it with a pinch of salt as we continue to study Modello Italia.
|
|
|
40 |
* Modello Italia is probably trained on around 1T tokens of Italian text;
|
41 |
-
*
|
|
|
42 |
|
43 |
## Tokenizer
|
44 |
The following information is based on the information we could gather, that is, it is NOT official.
|
|
|
37 |
## Training dataset
|
38 |
The following information is based on the information we could gather, that is, it is NOT official.
|
39 |
Please take it with a pinch of salt as we continue to study Modello Italia.
|
40 |
+
* **The training data of Modello Italia is unknown;**
|
41 |
* Modello Italia is probably trained on around 1T tokens of Italian text;
|
42 |
+
* We know that the training data is mostly Italian text and source code;
|
43 |
+
* We know that the training data includes text from Editoria Nazionale.
|
44 |
|
45 |
## Tokenizer
|
46 |
The following information is based on the information we could gather, that is, it is NOT official.
|