kskshr commited on
Commit
6368cbf
1 Parent(s): fd41cfc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -68,7 +68,7 @@ output = model(**encoded_input)
68
  ```
69
 
70
  ## Training data
71
- We used a Japanese Wikipedia dump (as of 20230101, 15GB).
72
 
73
  ## Training procedure
74
  We first segmented the texts into words by KyTea and then tokenized the words into subwords using WordPiece with a vocabulary size of 32,000. We pre-trained the BERT model using [transformers](https://github.com/huggingface/transformers) library. The training took about 8 days using 4 NVIDIA A100-SXM4-80GB GPUs.
 
68
  ```
69
 
70
  ## Training data
71
+ We used a Japanese Wikipedia dump (as of 20230101, 3.3GB).
72
 
73
  ## Training procedure
74
  We first segmented the texts into words by KyTea and then tokenized the words into subwords using WordPiece with a vocabulary size of 32,000. We pre-trained the BERT model using [transformers](https://github.com/huggingface/transformers) library. The training took about 8 days using 4 NVIDIA A100-SXM4-80GB GPUs.