Edit model card



This repository provides a medium-sized Japanese GPT-2 model. The model was trained using code from Github repository rinnakk/japanese-pretrained-models by rinna Co., Ltd.

How to use the model

NOTE: Use T5Tokenizer to initiate the tokenizer.

from transformers import T5Tokenizer, AutoModelForCausalLM

tokenizer = T5Tokenizer.from_pretrained("rinna/japanese-gpt2-medium")
tokenizer.do_lower_case = True  # due to some bug of tokenizer config loading

model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-medium")

Model architecture

A 24-layer, 1024-hidden-size transformer-based language model.


The model was trained on Japanese CC-100 and Japanese Wikipedia to optimize a traditional language modelling objective on 8\*V100 GPUs for around 30 days. It reaches around 18 perplexity on a chosen validation set from the same data.


The model uses a sentencepiece-based tokenizer, the vocabulary was trained on the Japanese Wikipedia using the official sentencepiece training script.


The MIT license

Downloads last month
Hosted inference API
Text Generation
This model can be loaded on the Inference API on-demand.

Datasets used to train rinna/japanese-gpt2-medium

Spaces using rinna/japanese-gpt2-medium 5