language: ja
license: mit
Japanese DistilBERT Pretrained Model
A Japanese DistilBERT pretrained model, which was trained on Wikipedia.
Find here for a quickstart guidance in Japanese.
Table of Contents
Introduction
DistilBERT is a small, fast, cheap and light Transformer model based on Bert architecture. It has 40% less parameters than BERT-base, runs 60% faster while preserving 97% of BERT's performance as measured on the GLUE language understanding benchmark.
This model was trained with the official Hugging Face implementation from here for 2 weeks on AWS p3dn.24xlarge instance.
More details about distillation can be found in following paper. "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" by Sanh et al. (2019).
The teacher model is the pretrained Japanese BERT models from TOHOKU NLP LAB.
Currently only PyTorch compatible weights are available. Tensorflow checkpoints can be generated by following the official guide.
Requirements
torch>=1.3.1
torchvision>=0.4.2
transformers>=2.5.0
tensorboard>=1.14.0
tensorboardX==1.8
scikit-learn>=0.21.0
mecab-python3
Usage
Download model
Please download and unzip DistilBERT-base-jp.zip.
Use model
# Read from local path
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-japanese-whole-word-masking")
model = AutoModel.from_pretrained("LOCAL_PATH")
LOCAL_PATH means the path which above file is unzipped. 3 files should be included:
- pytorch_model.bin
- config.json
- vocal.txt
or
# Download from model library from huggingface.co
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-japanese-whole-word-masking")
model = AutoModel.from_pretrained("bandainamco-mirai/distilbert-base-japanese")
License
Copyright (c) 2020 BANDAI NAMCO Research Inc.
Released under the MIT license