gpt-j-japanese-6.8b / README.md
naclbit's picture
Create README.md
66fb516

A 6.8 billion parameter pre-trained model for Japanese language, based on EleutherAI's Mesh Transformer JAX, that has a similar model structure to their GPT-J-6B pre-trained model.

EleutherAIによるMesh Transformer JAXをコードベースとした、GPT-J-6Bに似たストラクチャと約68.7億パラメータを持つ日本語pre-trainedモデルです。

  • We used T5Tokenizer and SentencePiece instead of GPT-2/3 tokenizer. Normalization done by SentencePiece is must for Japanese tokenizing as there are so much many more variations for common symbols than Western languages.
  • Tokenizer has a vocabulary of 52,500 tokens and trained on Japanese Wikipedia dump as of 01 Aug 2021.
  • The model fits within 16GB VRAM GPUs like P100 for inference up to 1688 context length. Full 2048 context length output requires 20GB VRAM or more (e.g. GTX3090/A5000).
  • The model was trained with TPUv3-128 generously provided by Google TRC for about 4 weeks.

Specifications

Hyperparameter Value
n_parameters 6,876,450,080
n_layers 32
d_model 4,096
d_ff 16,384
n_heads 16
d_head 256
n_ctx 2,048
n_vocab 52,512
position encoding Rotary position encodings (RoPE)
RoPE dimensions 64