naclbit
/

gpt-j-japanese-6.8b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

gpt-j-japanese-6.8b / README.md

naclbit's picture

Update README.md

5817f2f over 2 years ago

|

No virus

1.52 kB

	---
	language:
	- ja
	tags:
	- japanese
	- text-generation
	- gptj
	- pytorch
	- transformers
	license: apache-2.0
	---

	A 6.8 billion parameter pre-trained model for Japanese language, based on EleutherAI's Mesh Transformer JAX, that has a similar model structure to their GPT-J-6B pre-trained model.

	EleutherAIによるMesh Transformer JAXをコードベースとした、GPT-J-6Bに似たストラクチャと約68.7億パラメータを持つ日本語pre-trainedモデルです。

	- We used T5Tokenizer and SentencePiece instead of GPT-2/3 tokenizer. Normalization done by SentencePiece is must for Japanese tokenizing as there are so much many more variations for common symbols than Western languages.
	- Tokenizer has a vocabulary of 52,500 tokens and trained on Japanese Wikipedia dump as of 01 Aug 2021.
	- The model fits within 16GB VRAM GPUs like P100 for inference up to 1688 context length. Full 2048 context length output requires 20GB VRAM or more (e.g. GTX3090/A5000).
	- The model was trained with TPUv3-128 generously provided by Google TRC for about 4 weeks.

	## Specifications

	\| Hyperparameter \| Value \|
	\|-------------------\|--------\|
	\| n_parameters \| 6,876,450,080 \|
	\| n_layers \| 32 \|
	\| d_model \| 4,096 \|
	\| d_ff \| 16,384 \|
	\| n_heads \| 16 \|
	\| d_head \| 256 \|
	\| n_ctx \| 2,048 \|
	\| n_vocab \| 52,512 \|
	\| position encoding \| [Rotary position encodings (RoPE)](https://arxiv.org/abs/2104.09864) \|
	\| RoPE dimensions \| 64 \|