naclbit commited on
Commit
66fb516
1 Parent(s): d5c9b0d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ A 6.8 billion parameter pre-trained model for Japanese language, based on EleutherAI's Mesh Transformer JAX, that has a similar model structure to their GPT-J-6B pre-trained model.
2
+
3
+ EleutherAIによるMesh Transformer JAXをコードベースとした、GPT-J-6Bに似たストラクチャと約68.7億パラメータを持つ日本語pre-trainedモデルです。
4
+
5
+ - We used T5Tokenizer and SentencePiece instead of GPT-2/3 tokenizer. Normalization done by SentencePiece is must for Japanese tokenizing as there are so much many more variations for common symbols than Western languages.
6
+ - Tokenizer has a vocabulary of 52,500 tokens and trained on Japanese Wikipedia dump as of 01 Aug 2021.
7
+ - The model fits within 16GB VRAM GPUs like P100 for inference up to 1688 context length. Full 2048 context length output requires 20GB VRAM or more (e.g. GTX3090/A5000).
8
+ - The model was trained with TPUv3-128 generously provided by Google TRC for about 4 weeks.
9
+
10
+ ## Specifications
11
+
12
+ | Hyperparameter | Value |
13
+ |-------------------|--------|
14
+ | n_parameters | 6,876,450,080 |
15
+ | n_layers | 32 |
16
+ | d_model | 4,096 |
17
+ | d_ff | 16,384 |
18
+ | n_heads | 16 |
19
+ | d_head | 256 |
20
+ | n_ctx | 2,048 |
21
+ | n_vocab | 52,512 |
22
+ | position encoding | [Rotary position encodings (RoPE)](https://arxiv.org/abs/2104.09864) |
23
+ | RoPE dimensions | 64 |