dkawahara commited on
Commit
9ae40e1
1 Parent(s): ff09ea6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -111,9 +111,9 @@ Please refer to [README.md](https://github.com/llm-jp/llm-jp-tokenizer) of `llm-
111
  The models have been pre-trained using a blend of the following datasets.
112
 
113
  | Language | Dataset | Tokens|
114
- |:---|:---|:---|
115
  |Japanese|[Wikipedia](https://huggingface.co/datasets/wikipedia)|1.4B
116
- ||[Common Crawl](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus)|130.7B
117
  |English|[Wikipedia](https://huggingface.co/datasets/wikipedia)|4.7B
118
  ||[The Pile](https://huggingface.co/datasets/EleutherAI/pile)|110.3B
119
  |Codes|[The Stack](https://huggingface.co/datasets/bigcode/the-stack)|8.7B
 
111
  The models have been pre-trained using a blend of the following datasets.
112
 
113
  | Language | Dataset | Tokens|
114
+ |:---|:---|---:|
115
  |Japanese|[Wikipedia](https://huggingface.co/datasets/wikipedia)|1.4B
116
+ ||[Common Crawl](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v2)|130.7B
117
  |English|[Wikipedia](https://huggingface.co/datasets/wikipedia)|4.7B
118
  ||[The Pile](https://huggingface.co/datasets/EleutherAI/pile)|110.3B
119
  |Codes|[The Stack](https://huggingface.co/datasets/bigcode/the-stack)|8.7B