This is a Japanese BigBird base model pretrained on Japanese Wikipedia, the Japanese portion of CC-100, and the Japanese portion of OSCAR.
You can use this model for masked language modeling as follows:
from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("nlp-waseda/bigbird-base-japanese") model = AutoModelForMaskedLM.from_pretrained("nlp-waseda/bigbird-base-japanese") sentence = '[MASK] 大学 で 自然 言語 処理 を 学ぶ 。' # input should be segmented into words by Juman++ in advance encoding = tokenizer(sentence, return_tensors='pt') ...
You can fine-tune this model on downstream tasks.
This model was trained on Japanese Wikipedia (as of 20221101), the Japanese portion of CC-100, and the and the Japanese portion of OSCAR. It took two weeks using 16 NVIDIA A100 GPUs using transformers and DeepSpeed.
The following hyperparameters were used during pretraining:
- learning_rate: 1e-4
- per_device_train_batch_size: 6
- gradient_accumulation_steps: 2
- total_train_batch_size: 192
- max_seq_length: 4096
- training_steps: 600000
- warmup_steps: 6000
- bf16: true
- deepspeed: ds_config.json
We fine-tuned the following models and evaluated them on the dev set of JGLUE. We tuned learning rate and training epochs for each model and task following the JGLUE paper.
For the tasks other than MARC-ja, the maximum length is short, so the attention_type was set to "original_full", and fine-tuning was performed. For MARC-ja, both "block_sparse" and "original_full" were used.
|Waseda RoBERTa base||0.965||0.913||0.876||0.905||0.853||0.916||0.853|
|Waseda RoBERTa large (seq512)||0.969||0.925||0.890||0.928||0.910||0.955||0.900|
|BigBird base (original_full)||0.959||0.888||0.846||0.896||0.884||0.933||0.787|
|BigBird base (block_sparse)||0.959||-||-||-||-||-||-|
This work was supported by AI Bridging Cloud Infrastructure (ABCI) through the "Construction of a Japanese Large-Scale General-Purpose Language Model that Handles Long Sequences" at the 3rd ABCI Grand Challenge 2022.
- Downloads last month