Update README.md
Browse files
README.md
CHANGED
@@ -111,9 +111,9 @@ Please refer to [README.md](https://github.com/llm-jp/llm-jp-tokenizer) of `llm-
|
|
111 |
The models have been pre-trained using a blend of the following datasets.
|
112 |
|
113 |
| Language | Dataset | Tokens|
|
114 |
-
|
115 |
|Japanese|[Wikipedia](https://huggingface.co/datasets/wikipedia)|1.4B
|
116 |
-
||[Common Crawl](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus)|130.7B
|
117 |
|English|[Wikipedia](https://huggingface.co/datasets/wikipedia)|4.7B
|
118 |
||[The Pile](https://huggingface.co/datasets/EleutherAI/pile)|110.3B
|
119 |
|Codes|[The Stack](https://huggingface.co/datasets/bigcode/the-stack)|8.7B
|
|
|
111 |
The models have been pre-trained using a blend of the following datasets.
|
112 |
|
113 |
| Language | Dataset | Tokens|
|
114 |
+
|:---|:---|---:|
|
115 |
|Japanese|[Wikipedia](https://huggingface.co/datasets/wikipedia)|1.4B
|
116 |
+
||[Common Crawl](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v2)|130.7B
|
117 |
|English|[Wikipedia](https://huggingface.co/datasets/wikipedia)|4.7B
|
118 |
||[The Pile](https://huggingface.co/datasets/EleutherAI/pile)|110.3B
|
119 |
|Codes|[The Stack](https://huggingface.co/datasets/bigcode/the-stack)|8.7B
|