izumilab commited on
Commit
6839103
1 Parent(s): b7f462b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+
3
+ language: ja
4
+
5
+ license: cc-by-sa-4.0
6
+
7
+ datasets:
8
+
9
+ - wikipedia
10
+
11
+ widget:
12
+
13
+ - text: 東京大学で[MASK]の研究をしています。
14
+
15
+ ---
16
+
17
+ # ELECTRA small Japanese discriminator
18
+
19
+ This is a [ELECTRA](https://github.com/google-research/electra) model pretrained on texts in the Japanese language.
20
+
21
+ The codes for the pretraining are available at [retarfi/language-pretraining](https://github.com/retarfi/language-pretraining/tree/v1.0).
22
+
23
+ ## Model architecture
24
+
25
+ The model architecture is the same as ELECTRA small in the [original ELECTRA paper](https://arxiv.org/abs/2003.10555); 12 layers, 256 dimensions of hidden states, and 4 attention heads.
26
+
27
+ ## Training Data
28
+
29
+ The models are trained on the Japanese version of Wikipedia.
30
+
31
+ The training corpus is generated from the Japanese version of Wikipedia, using Wikipedia dump file as of June 1, 2021.
32
+
33
+ The corpus file is 2.9GB, consisting of approximately 20M sentences.
34
+
35
+ ## Tokenization
36
+
37
+ The texts are first tokenized by MeCab with IPA dictionary and then split into subwords by the WordPiece algorithm.
38
+
39
+ The vocabulary size is 32768.
40
+
41
+ ## Training
42
+
43
+ The models are trained with the same configuration as ELECTRA small in the [original ELECTRA paper](https://arxiv.org/abs/2003.10555); 128 tokens per instance, 128 instances per batch, and 1M training steps.
44
+
45
+ The size of the generator is 1/4 of the size of the discriminator.
46
+
47
+ ## Citation
48
+
49
+ **There will be another paper for this pretrained model. Be sure to check here again when you cite.**
50
+
51
+ ```
52
+ @inproceedings{bert_electra_japanese,
53
+ title = {Construction and Validation of a Pre-Trained Language Model
54
+ Using Financial Documents}
55
+ author = {Masahiro Suzuki and Hiroki Sakaji and Masanori Hirano and Kiyoshi Izumi},
56
+ month = {oct},
57
+ year = {2021},
58
+ booktitle = {"Proceedings of JSAI Special Interest Group on Financial Infomatics (SIG-FIN) 27"}
59
+ }
60
+ ```
61
+
62
+ ## Licenses
63
+
64
+ The pretrained models are distributed under the terms of the [Creative Commons Attribution-ShareAlike 4.0](https://creativecommons.org/licenses/by-sa/4.0/).
65
+
66
+ ## Acknowledgments
67
+
68
+ This work was supported by JSPS KAKENHI Grant Number JP21K12010.