Edit model card

japanese-gpt-1b

rinna-icon

This repository provides a 1.3B-parameter Japanese GPT model. The model was trained by rinna Co., Ltd.

How to use the model

NOTE: Use T5Tokenizer to initiate the tokenizer.

import torch
from transformers import T5Tokenizer, AutoModelForCausalLM

tokenizer = T5Tokenizer.from_pretrained("rinna/japanese-gpt-1b")
model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt-1b")

if torch.cuda.is_available():
    model = model.to("cuda")

text = "西田幾多郎は、"
token_ids = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt")

with torch.no_grad():
    output_ids = model.generate(
        token_ids.to(model.device),
        max_length=100,
        min_length=100,
        do_sample=True,
        top_k=500,
        top_p=0.95,
        pad_token_id=tokenizer.pad_token_id,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        bad_word_ids=[[tokenizer.unk_token_id]]
    )

output = tokenizer.decode(output_ids.tolist()[0])
print(output)  
# sample output: 西田幾多郎は、その主著の「善の研究」などで、人間の内面に自然とその根源があると指摘し、その根源的な性格は、この西田哲学を象徴しているとして、カントの「純粋理性批判」と「判断力批判」を対比して捉えます。それは、「人が理性的存在であるかぎりにおいて、人はその当人に固有な道徳的に自覚された善悪の基準を持っている」とするもので、この理性的な善悪の観念を否定するのがカントの

Model architecture

A 24-layer, 2048-hidden-size transformer-based language model.

Training

The model was trained on Japanese C4, Japanese CC-100 and Japanese Wikipedia to optimize a traditional language modelling objective. It reaches around 14 perplexity on a chosen validation set from the same data.

Tokenization

The model uses a sentencepiece-based tokenizer. The vocabulary was first trained on a selected subset from the training data using the official sentencepiece training script, and then augmented with emojis and symbols.

Licenese

The MIT license

Downloads last month
18,277
Hosted inference API
Text Generation
Examples
Examples
This model can be loaded on the Inference API on-demand.

Datasets used to train rinna/japanese-gpt-1b