ptaszynski commited on
Commit
ef6848d
1 Parent(s): 258cdec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -22,9 +22,19 @@ This model uses ELECTRA Small model settings, 12 layers, 128 dimensions of hidde
22
 
23
  Vocabulary size was set to 32,000 tokens.
24
 
 
 
 
 
 
 
 
 
 
25
  ## Licenses
26
 
27
  The pretrained model with all attached files is distributed under the terms of the [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en) license.
 
28
  <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
29
 
30
  ## Citations
 
22
 
23
  Vocabulary size was set to 32,000 tokens.
24
 
25
+ ## Training data and libraries
26
+
27
+ YACIS-ELECTRA is trained on the whole of [YACIS](https://github.com/ptaszynski/yacis-corpus) blog corpus, which is a Japanese blog corpus containing 5.6 billion words in 354 million sentences.
28
+
29
+ The corpus was originally split into sentences using custom rules, and each sentence was tokenized using [MeCab](https://taku910.github.io/mecab/). Subword tokenization for pretraining was done with WordPiece.
30
+
31
+ We used original [ELECTRA](https://github.com/google-research/electra) repository for pretraining. The pretrainig process took 7 days and 6 hours under the following environment: CPU: Intel Core i9-7920X, RAM: 132 GB, GPU: GeForce GTX 1080 Ti x1.
32
+
33
+
34
  ## Licenses
35
 
36
  The pretrained model with all attached files is distributed under the terms of the [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en) license.
37
+
38
  <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
39
 
40
  ## Citations