ku-accms
/

bert-base-japanese-ssuw

Inference Endpoints

Model card Files Files and versions Community

kskshr commited on Apr 12, 2023

Commit

fd41cfc

•

1 Parent(s): b143164

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -68,7 +68,7 @@ output = model(**encoded_input)
 ```
 ## Training data
-We used a Japanese Wikipedia dump (as of 20230101, 3.7GB).
 ## Training procedure
 We first segmented the texts into words by KyTea and then tokenized the words into subwords using WordPiece with a vocabulary size of 32,000. We pre-trained the BERT model using [transformers](https://github.com/huggingface/transformers) library. The training took about 8 days using 4 NVIDIA A100-SXM4-80GB GPUs.

 ```
 ## Training data
+We used a Japanese Wikipedia dump (as of 20230101, 15GB).
 ## Training procedure
 We first segmented the texts into words by KyTea and then tokenized the words into subwords using WordPiece with a vocabulary size of 32,000. We pre-trained the BERT model using [transformers](https://github.com/huggingface/transformers) library. The training took about 8 days using 4 NVIDIA A100-SXM4-80GB GPUs.