izumi-lab
/

deberta-v2-small-japanese

Inference Endpoints

Model card Files Files and versions Community

retarfi commited on Oct 26, 2023

Commit

5f961ce

•

1 Parent(s): df5e5f2

docs

Files changed (1) hide show

README.md +89 -1

README.md CHANGED Viewed

@@ -1,3 +1,91 @@
 ---
-license: cc-by-4.0
 ---

 ---
+language: ja
+license: cc-by-sa-4.0
+library_name: transformers
+datasets:
+  - cc100
+  - mc4
+  - oscar
+  - wikipedia
+  - izumi-lab/cc100-ja
+  - izumi-lab/mc4-ja-filter-ja-normal
+  - izumi-lab/oscar2301-ja-filter-ja-normal
+  - izumi-lab/wikipedia-ja-20230720
+  - izumi-lab/wikinews-ja-20230728
+widget:
+- text: 東京大学で[MASK]の研究をしています。
 ---
+# DeBERTa V2 small Japanese
+This is a [DeBERTaV2](https://github.com/microsoft/DeBERTa) model pretrained on Japanese texts.
+The codes for the pretraining are available at [retarfi/language-pretraining](https://github.com/retarfi/language-pretraining/releases/tag/v2.2.1).
+## How to use
+You can use this model for masked language modeling as follows:
+```python
+from transformers import AutoTokenizer, AutoModelForMaskedLM
+tokenizer = AutoTokenizer.from_pretrained("izumi-lab/deberta-v2-base-japanese")
+model = AutoModelForMaskedLM.from_pretrained("izumi-lab/deberta-v2-base-japanese")
+...
+```
+## Tokenization
+The model uses a sentencepiece-based tokenizer, the vocabulary was trained on the Japanese Wikipedia using [sentencepiece](https://github.com/google/sentencepiece).
+## Training Data
+We used the following corpora for pre-training:
+- [Japanese portion of CC-100](https://huggingface.co/datasets/izumi-lab/cc100-ja)
+- [Japanese portion of mC4](https://huggingface.co/datasets/izumi-lab/mc4-ja-filter-ja-normal)
+- [Japanese portion of OSCAR2301](izumi-lab/oscar2301-ja-filter-ja-normal)
+- [Japanese Wikipedia as of July 20, 2023](https://huggingface.co/datasets/izumi-lab/wikipedia-ja-20230720)
+- [Japanese Wikinews as of July 28, 2023](https://huggingface.co/datasets/izumi-lab/wikinews-ja-20230728)
+## Training Parameters
+- learning_rate: 6e-4
+- total_train_batch_size: 2,016
+- max_seq_length: 128
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
+- lr_scheduler_type: linear schedule with warmup
+- training_steps: 1,000,000
+- warmup_steps: 100,000
+- precision: BF16
+## Fine-tuning on General NLU tasks
+We evaluate our model with the average of five seeds.
+| Model                                                                     | JSTS             | JNLI      | JCommonsenseQA |
+|---------------------------------------------------------------------------|------------------|-----------|----------------|
+|                                                                           | Pearson/Spearman | acc       | acc            |
+| **DeBERTaV2 small**                                                       | **0.890/0.846**  | **0.880** | **0.737**      |
+| [UTokyo BERT small](https://huggingface.co/izumi-lab/bert-small-japanese) | 0.889/0.841      | 0.841     | 0.715          |
+## Citation
+TBA
+## Licenses
+The pretrained models are distributed under the terms of the [Creative Commons Attribution-ShareAlike 4.0](https://creativecommons.org/licenses/by-sa/4.0/).
+## Acknowledgments
+This work was supported in part by JSPS KAKENHI Grant Number JP21K12010, and the JST-Mirai Program Grant Number JPMJMI20B1, Japan.