flyingfishinwater
/

chinese-baby-llama2

Text2Text Generation

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Qi Wang commited on Sep 29, 2023

Commit

b7ac396

·

1 Parent(s): 983603d

Update readme_en.md

Files changed (1) hide show

readme_en.md +9 -9

readme_en.md CHANGED Viewed

@@ -22,10 +22,10 @@ The tokenizer for the model was also retrained, without relying on any existing
 Training Parameters:
-1. Maximum Sentence Length: 4096
-2. Vocabulary Size: 65534
-3. Normalization Rule: nfkc
-4. Character Coverage: ... (and so on)
 |                                                              | Llama2                                                       | Baby Llama2                                                  |
 | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
@@ -48,11 +48,11 @@ Before full training, the corpus is processed for vectorization. Using the recen
 Pre-training is done on a single 3090 machine. The model uses the architecture of llama2, and the training parameters are as follows:
-1. max_seq_len = 512
-2. dim = 512
-3. n_headers = 8
-4. n_layers = 8
-5. n_kv_headers = 8
 ## Demonstration

 Training Parameters:
+1. Maximum Sentence Length: 2657
+2. Vocabulary Size: 32000
+3. Normalization Rule: identity
+4. Character Coverage: 0.9995
 |                                                              | Llama2                                                       | Baby Llama2                                                  |
 | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
 Pre-training is done on a single 3090 machine. The model uses the architecture of llama2, and the training parameters are as follows:
+1. max_seq_len = 1024
+2. dim = 768
+3. n_headers = 12
+4. n_layers = 12
+5. n_kv_headers = 12
 ## Demonstration