metadata

language: ja
license: cc-by-sa-4.0
library_name: transformers
datasets:
  - cc100
  - mc4
  - oscar
  - wikipedia
  - izumi-lab/cc100-ja
  - izumi-lab/mc4-ja-filter-ja-normal
  - izumi-lab/oscar2301-ja-filter-ja-normal
  - izumi-lab/wikipedia-ja-20230720
  - izumi-lab/wikinews-ja-20230728
widget:
  - text: 東京大学で[MASK]の研究をしています。

DeBERTa V2 base Japanese

This is a DeBERTaV2 model pretrained on Japanese texts. The codes for the pretraining are available at retarfi/language-pretraining.

How to use

You can use this model for masked language modeling as follows:

from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("izumi-lab/deberta-v2-base-japanese")
model = AutoModelForMaskedLM.from_pretrained("izumi-lab/deberta-v2-base-japanese")
...

Tokenization

The model uses a sentencepiece-based tokenizer, the vocabulary was trained on the Japanese Wikipedia using sentencepiece.

Training Data

We used the following corpora for pre-training:

We pretrained with the corpora mentioned above for 900k steps, and additionally pretrained with the following financial corpora for 100k steps:

Summaries of financial results from October 9, 2012, to December 31, 2022
Securities reports from February 8, 2018, to December 31, 2022
News articles

Training Parameters

learning_rate in parentheses indicate the learning rate for additional pre-training with the financial corpus.

learning_rate: 2.4e-4 (6e-5)
total_train_batch_size: 2,016
max_seq_length: 512
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
lr_scheduler_type: linear schedule with warmup
training_steps: 1,000,000
warmup_steps: 100,000
precision: FP16

Fine-tuning on General NLU tasks

We evaluate our model with the average of five seeds.
Other models are from JGLUE repository

Model	JSTS	JNLI	JCommonsenseQA
	Pearson/Spearman	acc	acc
DeBERTaV2 base	0.890/0.846	0.xxx	0.859
Waseda RoBERTa base	0.913/0.873	0.895	0.840
Tohoku BERT base	0.909/0.868	0.899	0.808

Citation

TBA

Licenses

The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0.

Acknowledgments

This work was supported in part by JSPS KAKENHI Grant Number JP21K12010, and the JST-Mirai Program Grant Number JPMJMI20B1, Japan.