DeBERTa V2 small Japanese

This is a DeBERTaV2 model pretrained on Japanese texts. The codes for the pretraining are available at retarfi/language-pretraining.

How to use

You can use this model for masked language modeling as follows:

from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("izumi-lab/deberta-v2-small-japanese", use_fast=False)
model = AutoModelForMaskedLM.from_pretrained("izumi-lab/deberta-v2-small-japanese")
...

Tokenization

The model uses a sentencepiece-based tokenizer, the vocabulary was trained on the Japanese Wikipedia using sentencepiece.

Training Data

We used the following corpora for pre-training:

Training Parameters

learning_rate: 6e-4
total_train_batch_size: 2,016
max_seq_length: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
lr_scheduler_type: linear schedule with warmup
training_steps: 1,000,000
warmup_steps: 100,000
precision: BF16

Fine-tuning on General NLU tasks

We evaluate our model with the average of five seeds.

Model	JSTS	JNLI	JCommonsenseQA
	Pearson/Spearman	acc	acc
DeBERTaV2 small	0.890/0.846	0.880	0.737
UTokyo BERT small	0.889/0.841	0.841	0.715

Citation

Citation will be updated. Please check when you would cite.

@article{Suzuki-etal-2023-ipm,
  title = {Constructing and analyzing domain-specific language model for financial text mining}
  author = {Masahiro Suzuki and Hiroki Sakaji and Masanori Hirano and Kiyoshi Izumi},
  journal = {Information Processing \& Management},
  volume = {60},
  number = {2},
  pages = {103194},
  year = {2023},
  doi = {10.1016/j.ipm.2022.103194}
}
@article{Suzuki-2024-findebertav2,
  jtitle = {{FinDeBERTaV2: 単語分割フリーな金融事前学習言語モデル}},
  title = {{FinDeBERTaV2: Word-Segmentation-Free Pre-trained Language Model for Finance}},
  jauthor = {鈴木, 雅弘 and 坂地, 泰紀 and 平野, 正徳 and 和泉, 潔},
  author = {Masahiro Suzuki and Hiroki Sakaji and Masanori Hirano and Kiyoshi Izumi},
  jjournal = {人工知能学会論文誌},
  journal = {Transactions of the Japanese Society for Artificial Intelligence},
  volume = {39},
  number = {4},
  pages={FIN23-G_1-14},
  year = {2024},
  doi = {10.1527/tjsai.39-4_FIN23-G},
}

Licenses

The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0.

Acknowledgments

This work was supported in part by JSPS KAKENHI Grant Number JP21K12010, and the JST-Mirai Program Grant Number JPMJMI20B1, Japan.

izumi-lab
/

deberta-v2-small-japanese