README.md · naclbit/gpt-j-japanese-6.8b at 66fb516e30a86b708d91a7acd6d96eb2abac32d4

A 6.8 billion parameter pre-trained model for Japanese language, based on EleutherAI's Mesh Transformer JAX, that has a similar model structure to their GPT-J-6B pre-trained model.

EleutherAIによるMesh Transformer JAXをコードベースとした、GPT-J-6Bに似たストラクチャと約68.7億パラメータを持つ日本語pre-trainedモデルです。

We used T5Tokenizer and SentencePiece instead of GPT-2/3 tokenizer. Normalization done by SentencePiece is must for Japanese tokenizing as there are so much many more variations for common symbols than Western languages.
Tokenizer has a vocabulary of 52,500 tokens and trained on Japanese Wikipedia dump as of 01 Aug 2021.
The model fits within 16GB VRAM GPUs like P100 for inference up to 1688 context length. Full 2048 context length output requires 20GB VRAM or more (e.g. GTX3090/A5000).
The model was trained with TPUv3-128 generously provided by Google TRC for about 4 weeks.

Specifications

Hyperparameter	Value
n_parameters	6,876,450,080
n_layers	32
d_model	4,096
d_ff	16,384
n_heads	16
d_head	256
n_ctx	2,048
n_vocab	52,512
position encoding	Rotary position encodings (RoPE)
RoPE dimensions	64