nlp-waseda
/

roberta-base-japanese

Inference Endpoints

Model card Files Files and versions Community

dkawahara commited on Dec 22, 2021

Commit

03d5bd1

•

1 Parent(s): 17a38ab

Updated README.md.

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -32,11 +32,11 @@ You can fine-tune this model on downstream tasks.
 ## Tokenization
-The input text should be segmented into words by [Juman++](https://github.com/ku-nlp/jumanpp) in advance. Each word is tokenized into subwords by [sentencepiece](https://github.com/google/sentencepiece).
 ## Vocabulary
-The vocabulary consists of 32000 subwords induced by the unigram language model of [sentencepiece](https://github.com/google/sentencepiece).
 ## Training procedure
@@ -53,6 +53,7 @@ The following hyperparameters were used during pretraining:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - training_steps: 700000
 - mixed_precision_training: Native AMP
 ## Performance on JGLUE

 ## Tokenization
+The input text should be segmented into words by [Juman++](https://github.com/ku-nlp/jumanpp) in advance. Each word is tokenized into tokens by [sentencepiece](https://github.com/google/sentencepiece).
 ## Vocabulary
+The vocabulary consists of 32000 tokens including words ([JumanDIC](https://github.com/ku-nlp/JumanDIC)) and subwords induced by the unigram language model of [sentencepiece](https://github.com/google/sentencepiece).
 ## Training procedure
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - training_steps: 700000
+- warmup_steps: 10000
 - mixed_precision_training: Native AMP
 ## Performance on JGLUE