retarfi's picture
Add model
a93c61f
|
raw
history blame
3.4 kB
metadata
language: ja
license: cc-by-sa-4.0
library_name: transformers
datasets:
  - cc100
  - mc4
  - oscar
  - wikipedia
  - izumi-lab/cc100-ja
  - izumi-lab/mc4-ja-filter-ja-normal
  - izumi-lab/oscar2301-ja-filter-ja-normal
  - izumi-lab/wikipedia-ja-20230720
  - izumi-lab/wikinews-ja-20230728
widget:
  - text: 東京大学で[MASK]の研究をしています。

DeBERTa V2 base Japanese

This is a DeBERTaV2 model pretrained on Japanese texts. The codes for the pretraining are available at retarfi/language-pretraining.

How to use

You can use this model for masked language modeling as follows:

from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("izumi-lab/deberta-v2-base-japanese")
model = AutoModelForMaskedLM.from_pretrained("izumi-lab/deberta-v2-base-japanese")
...

Tokenization

The model uses a sentencepiece-based tokenizer, the vocabulary was trained on the Japanese Wikipedia using sentencepiece.

Training Data

We used the following corpora for pre-training:

We pretrained with the corpora mentioned above for 900k steps, and additionally pretrained with the following financial corpora for 100k steps:

  • Summaries of financial results from October 9, 2012, to December 31, 2022
  • Securities reports from February 8, 2018, to December 31, 2022
  • News articles

Training Parameters

learning_rate in parentheses indicate the learning rate for additional pre-training with the financial corpus.

  • learning_rate: 2.4e-4 (6e-5)
  • total_train_batch_size: 2,016
  • max_seq_length: 512
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
  • lr_scheduler_type: linear schedule with warmup
  • training_steps: 1,000,000
  • warmup_steps: 100,000
  • precision: FP16

Fine-tuning on General NLU tasks

We evaluate our model with the average of five seeds.
Other models are from JGLUE repository

Model JSTS JNLI JCommonsenseQA
Pearson/Spearman acc acc
DeBERTaV2 base 0.890/0.846 0.xxx 0.859
Waseda RoBERTa base 0.913/0.873 0.895 0.840
Tohoku BERT base 0.909/0.868 0.899 0.808

Citation

TBA

Licenses

The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0.

Acknowledgments

This work was supported in part by JSPS KAKENHI Grant Number JP21K12010, and the JST-Mirai Program Grant Number JPMJMI20B1, Japan.