Edit model card

japanese-gpt2-small

rinna-icon

This repository provides a small-sized Japanese GPT-2 model. The model was trained using code from Github repository rinnakk/japanese-pretrained-models by rinna Co., Ltd.

How to use the model

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt2-small", use_fast=False)
tokenizer.do_lower_case = True  # due to some bug of tokenizer config loading

model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-small")

Model architecture

A 12-layer, 768-hidden-size transformer-based language model.

Training

The model was trained on Japanese CC-100 and Japanese Wikipedia to optimize a traditional language modelling objective on 8\*V100 GPUs for around 15 days. It reaches around 21 perplexity on a chosen validation set from CC-100.

Tokenization

The model uses a sentencepiece-based tokenizer, the vocabulary was trained on the Japanese Wikipedia using the official sentencepiece training script.

Licenese

The MIT license

Downloads last month
4,997
Safetensors
Model size
123M params
Tensor type
F32
·
U8
·

Datasets used to train rinna/japanese-gpt2-small

Spaces using rinna/japanese-gpt2-small 7

Collection including rinna/japanese-gpt2-small