---
language: ja
license: cc-by-sa-4.0
library_name: transformers
datasets:
  - cc100
  - mc4
  - oscar
  - wikipedia
  - izumi-lab/cc100-ja
  - izumi-lab/mc4-ja-filter-ja-normal
  - izumi-lab/oscar2301-ja-filter-ja-normal
  - izumi-lab/wikipedia-ja-20230720
  - izumi-lab/wikinews-ja-20230728
---

# DeBERTa V2 base Japanese

This is a [DeBERTaV2](https://github.com/microsoft/DeBERTa) model pretrained on Japanese texts.
The codes for the pretraining are available at [retarfi/language-pretraining](https://github.com/retarfi/language-pretraining/releases/tag/v2.2.1).


## How to use

You can use this model for masked language modeling as follows:

```python
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("izumi-lab/deberta-v2-base-japanese", use_fast=False)
model = AutoModelForMaskedLM.from_pretrained("izumi-lab/deberta-v2-base-japanese")
...
```


## Tokenization

The model uses a sentencepiece-based tokenizer, the vocabulary was trained on the Japanese Wikipedia using [sentencepiece](https://github.com/google/sentencepiece).


## Training Data

We used the following corpora for pre-training:

- [Japanese portion of CC-100](https://huggingface.co/datasets/izumi-lab/cc100-ja)
- [Japanese portion of mC4](https://huggingface.co/datasets/izumi-lab/mc4-ja-filter-ja-normal)
- [Japanese portion of OSCAR2301](https://huggingface.co/datasets/izumi-lab/oscar2301-ja-filter-ja-normal)
- [Japanese Wikipedia as of July 20, 2023](https://huggingface.co/datasets/izumi-lab/wikipedia-ja-20230720)
- [Japanese Wikinews as of July 28, 2023](https://huggingface.co/datasets/izumi-lab/wikinews-ja-20230728)


## Training Parameters

learning_rate in parentheses indicate the learning rate for additional pre-training with the financial corpus.
- learning_rate: 2.4e-4 (6e-5)
- total_train_batch_size: 2,016
- max_seq_length: 512
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
- lr_scheduler_type: linear schedule with warmup
- training_steps: 1,000,000
- warmup_steps: 100,000
- precision: FP16


## Fine-tuning on General NLU tasks

We evaluate our model with the average of five seeds.  
Other models are from [JGLUE repository](https://github.com/yahoojapan/JGLUE)


| Model                         | JSTS             | JNLI      | JCommonsenseQA |
|-------------------------------|------------------|-----------|----------------|
|                               | Pearson/Spearman | acc       | acc            |
| **DeBERTaV2 base**            | **0.919/0.882**  | **0.912** | **0.859**      |
| Waseda RoBERTa base           | 0.913/0.873      | 0.895     | 0.840          |
| Tohoku BERT base              | 0.909/0.868      | 0.899     | 0.808          |


## Citation

Citation will be updated.
Please check when you would cite.

```
@article{Suzuki-etal-2023-ipm,
  title = {Constructing and analyzing domain-specific language model for financial text mining},
  author = {Masahiro Suzuki and Hiroki Sakaji and Masanori Hirano and Kiyoshi Izumi},
  journal = {Information Processing \& Management},
  volume = {60},
  number = {2},
  pages = {103194},
  year = {2023},
  doi = {10.1016/j.ipm.2022.103194}
}
```


## Licenses

The pretrained models are distributed under the terms of the [Creative Commons Attribution-ShareAlike 4.0](https://creativecommons.org/licenses/by-sa/4.0/).


## Acknowledgments

This work was supported in part by JSPS KAKENHI Grant Number JP21K12010, and the JST-Mirai Program Grant Number JPMJMI20B1, Japan.