Edit model card

DeBERTa V2 base Japanese

This is a DeBERTaV2 model pretrained on Japanese texts. The codes for the pretraining are available at retarfi/language-pretraining.

How to use

You can use this model for masked language modeling as follows:

from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("izumi-lab/deberta-v2-base-japanese", use_fast=False)
model = AutoModelForMaskedLM.from_pretrained("izumi-lab/deberta-v2-base-japanese")
...

Tokenization

The model uses a sentencepiece-based tokenizer, the vocabulary was trained on the Japanese Wikipedia using sentencepiece.

Training Data

We used the following corpora for pre-training:

Training Parameters

learning_rate in parentheses indicate the learning rate for additional pre-training with the financial corpus.

  • learning_rate: 2.4e-4 (6e-5)
  • total_train_batch_size: 2,016
  • max_seq_length: 512
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
  • lr_scheduler_type: linear schedule with warmup
  • training_steps: 1,000,000
  • warmup_steps: 100,000
  • precision: FP16

Fine-tuning on General NLU tasks

We evaluate our model with the average of five seeds.
Other models are from JGLUE repository

Model JSTS JNLI JCommonsenseQA
Pearson/Spearman acc acc
DeBERTaV2 base 0.919/0.882 0.912 0.859
Waseda RoBERTa base 0.913/0.873 0.895 0.840
Tohoku BERT base 0.909/0.868 0.899 0.808

Citation

Citation will be updated. Please check when you would cite.

@article{Suzuki-etal-2023-ipm,
  title = {Constructing and analyzing domain-specific language model for financial text mining},
  author = {Masahiro Suzuki and Hiroki Sakaji and Masanori Hirano and Kiyoshi Izumi},
  journal = {Information Processing \& Management},
  volume = {60},
  number = {2},
  pages = {103194},
  year = {2023},
  doi = {10.1016/j.ipm.2022.103194}
}
@article{Suzuki-2024-findebertav2,
  jtitle = {{FinDeBERTaV2: 単語分割フリーな金融事前学習言語モデル}},
  title = {{FinDeBERTaV2: Word-Segmentation-Free Pre-trained Language Model for Finance}},
  jauthor = {鈴木, 雅弘 and 坂地, 泰紀 and 平野, 正徳 and 和泉, 潔},
  author = {Masahiro Suzuki and Hiroki Sakaji and Masanori Hirano and Kiyoshi Izumi},
  jjournal = {人工知能学会論文誌},
  journal = {Transactions of the Japanese Society for Artificial Intelligence},
  volume = {39},
  number = {4},
  pages={FIN23-G_1-14},
  year = {2024},
  doi = {10.1527/tjsai.39-4_FIN23-G},
}

Licenses

The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0.

Acknowledgments

This work was supported in part by JSPS KAKENHI Grant Number JP21K12010, and the JST-Mirai Program Grant Number JPMJMI20B1, Japan.

Downloads last month
219
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train izumi-lab/deberta-v2-base-japanese