Update README.md
Browse files
README.md
CHANGED
@@ -74,7 +74,7 @@ Please refer [README.md of llm-ja-tokenizer](https://github.com/llm-jp/llm-jp-to
|
|
74 |
|
75 |
## Datasets
|
76 |
|
77 |
-
|
78 |
|
79 |
The models have been pre-trained on approximately 287.5B tokens, sourced from a blend of the following datasets.
|
80 |
|
@@ -88,7 +88,7 @@ The models have been pre-trained on approximately 287.5B tokens, sourced from a
|
|
88 |
|
89 |
Pretraining was done by 10-hold shards that consists approx. 27-28B tokens. We further finalized the pretraining with additional cleaned 27B tokens data.
|
90 |
|
91 |
-
|
92 |
|
93 |
The models have been fine-tuned on the following datasets.
|
94 |
|
|
|
74 |
|
75 |
## Datasets
|
76 |
|
77 |
+
### Pre-training
|
78 |
|
79 |
The models have been pre-trained on approximately 287.5B tokens, sourced from a blend of the following datasets.
|
80 |
|
|
|
88 |
|
89 |
Pretraining was done by 10-hold shards that consists approx. 27-28B tokens. We further finalized the pretraining with additional cleaned 27B tokens data.
|
90 |
|
91 |
+
### Instruction tuning
|
92 |
|
93 |
The models have been fine-tuned on the following datasets.
|
94 |
|