Qi Wang commited on
Commit
b7ac396
1 Parent(s): 983603d

Update readme_en.md

Browse files
Files changed (1) hide show
  1. readme_en.md +9 -9
readme_en.md CHANGED
@@ -22,10 +22,10 @@ The tokenizer for the model was also retrained, without relying on any existing
22
 
23
  Training Parameters:
24
 
25
- 1. Maximum Sentence Length: 4096
26
- 2. Vocabulary Size: 65534
27
- 3. Normalization Rule: nfkc
28
- 4. Character Coverage: ... (and so on)
29
 
30
  | | Llama2 | Baby Llama2 |
31
  | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
@@ -48,11 +48,11 @@ Before full training, the corpus is processed for vectorization. Using the recen
48
 
49
  Pre-training is done on a single 3090 machine. The model uses the architecture of llama2, and the training parameters are as follows:
50
 
51
- 1. max_seq_len = 512
52
- 2. dim = 512
53
- 3. n_headers = 8
54
- 4. n_layers = 8
55
- 5. n_kv_headers = 8
56
 
57
  ## Demonstration
58
 
 
22
 
23
  Training Parameters:
24
 
25
+ 1. Maximum Sentence Length: 2657
26
+ 2. Vocabulary Size: 32000
27
+ 3. Normalization Rule: identity
28
+ 4. Character Coverage: 0.9995
29
 
30
  | | Llama2 | Baby Llama2 |
31
  | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
 
48
 
49
  Pre-training is done on a single 3090 machine. The model uses the architecture of llama2, and the training parameters are as follows:
50
 
51
+ 1. max_seq_len = 1024
52
+ 2. dim = 768
53
+ 3. n_headers = 12
54
+ 4. n_layers = 12
55
+ 5. n_kv_headers = 12
56
 
57
  ## Demonstration
58