Update README.md
Browse files
README.md
CHANGED
@@ -68,7 +68,7 @@ output = model(**encoded_input)
|
|
68 |
```
|
69 |
|
70 |
## Training data
|
71 |
-
We used a Japanese Wikipedia dump (as of 20230101,
|
72 |
|
73 |
## Training procedure
|
74 |
We first segmented the texts into words by KyTea and then tokenized the words into subwords using WordPiece with a vocabulary size of 32,000. We pre-trained the BERT model using [transformers](https://github.com/huggingface/transformers) library. The training took about 8 days using 4 NVIDIA A100-SXM4-80GB GPUs.
|
|
|
68 |
```
|
69 |
|
70 |
## Training data
|
71 |
+
We used a Japanese Wikipedia dump (as of 20230101, 15GB).
|
72 |
|
73 |
## Training procedure
|
74 |
We first segmented the texts into words by KyTea and then tokenized the words into subwords using WordPiece with a vocabulary size of 32,000. We pre-trained the BERT model using [transformers](https://github.com/huggingface/transformers) library. The training took about 8 days using 4 NVIDIA A100-SXM4-80GB GPUs.
|