Edit model card

DeBERTa V2 small Japanese

This is a DeBERTaV2 model pretrained on Japanese texts. The codes for the pretraining are available at retarfi/language-pretraining.

How to use

You can use this model for masked language modeling as follows:

from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("izumi-lab/deberta-v2-small-japanese", use_fast=False)
model = AutoModelForMaskedLM.from_pretrained("izumi-lab/deberta-v2-small-japanese")


The model uses a sentencepiece-based tokenizer, the vocabulary was trained on the Japanese Wikipedia using sentencepiece.

Training Data

We used the following corpora for pre-training:

Training Parameters

  • learning_rate: 6e-4
  • total_train_batch_size: 2,016
  • max_seq_length: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
  • lr_scheduler_type: linear schedule with warmup
  • training_steps: 1,000,000
  • warmup_steps: 100,000
  • precision: BF16

Fine-tuning on General NLU tasks

We evaluate our model with the average of five seeds.

Model JSTS JNLI JCommonsenseQA
Pearson/Spearman acc acc
DeBERTaV2 small 0.890/0.846 0.880 0.737
UTokyo BERT small 0.889/0.841 0.841 0.715


Citation will be updated. Please check when you would cite.

  title = {Constructing and analyzing domain-specific language model for financial text mining}
  author = {Masahiro Suzuki and Hiroki Sakaji and Masanori Hirano and Kiyoshi Izumi},
  journal = {Information Processing \& Management},
  volume = {60},
  number = {2},
  pages = {103194},
  year = {2023},
  doi = {10.1016/j.ipm.2022.103194}


The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0.


This work was supported in part by JSPS KAKENHI Grant Number JP21K12010, and the JST-Mirai Program Grant Number JPMJMI20B1, Japan.

Downloads last month

Datasets used to train izumi-lab/deberta-v2-small-japanese