DeBERTa V2 small Japanese
This is a DeBERTaV2 model pretrained on Japanese texts. The codes for the pretraining are available at retarfi/language-pretraining.
How to use
You can use this model for masked language modeling as follows:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("izumi-lab/deberta-v2-small-japanese", use_fast=False)
model = AutoModelForMaskedLM.from_pretrained("izumi-lab/deberta-v2-small-japanese")
...
Tokenization
The model uses a sentencepiece-based tokenizer, the vocabulary was trained on the Japanese Wikipedia using sentencepiece.
Training Data
We used the following corpora for pre-training:
- Japanese portion of CC-100
- Japanese portion of mC4
- Japanese portion of OSCAR2301
- Japanese Wikipedia as of July 20, 2023
- Japanese Wikinews as of July 28, 2023
Training Parameters
- learning_rate: 6e-4
- total_train_batch_size: 2,016
- max_seq_length: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
- lr_scheduler_type: linear schedule with warmup
- training_steps: 1,000,000
- warmup_steps: 100,000
- precision: BF16
Fine-tuning on General NLU tasks
We evaluate our model with the average of five seeds.
Model | JSTS | JNLI | JCommonsenseQA |
---|---|---|---|
Pearson/Spearman | acc | acc | |
DeBERTaV2 small | 0.890/0.846 | 0.880 | 0.737 |
UTokyo BERT small | 0.889/0.841 | 0.841 | 0.715 |
Citation
Citation will be updated. Please check when you would cite.
@article{Suzuki-etal-2023-ipm,
title = {Constructing and analyzing domain-specific language model for financial text mining}
author = {Masahiro Suzuki and Hiroki Sakaji and Masanori Hirano and Kiyoshi Izumi},
journal = {Information Processing \& Management},
volume = {60},
number = {2},
pages = {103194},
year = {2023},
doi = {10.1016/j.ipm.2022.103194}
}
@article{Suzuki-2024-findebertav2,
jtitle = {{FinDeBERTaV2: 単語分割フリーな金融事前学習言語モデル}},
title = {{FinDeBERTaV2: Word-Segmentation-Free Pre-trained Language Model for Finance}},
jauthor = {鈴木, 雅弘 and 坂地, 泰紀 and 平野, 正徳 and 和泉, 潔},
author = {Masahiro Suzuki and Hiroki Sakaji and Masanori Hirano and Kiyoshi Izumi},
jjournal = {人工知能学会論文誌},
journal = {Transactions of the Japanese Society for Artificial Intelligence},
volume = {39},
number = {4},
pages={FIN23-G_1-14},
year = {2024},
doi = {10.1527/tjsai.39-4_FIN23-G},
}
Licenses
The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0.
Acknowledgments
This work was supported in part by JSPS KAKENHI Grant Number JP21K12010, and the JST-Mirai Program Grant Number JPMJMI20B1, Japan.
- Downloads last month
- 362
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.