Update README.md
Browse files
README.md
CHANGED
@@ -2,6 +2,7 @@
|
|
2 |
language:
|
3 |
- it
|
4 |
pipeline_tag: text-generation
|
|
|
5 |
widget:
|
6 |
- text: Alessandro è un ragazzo che progetta Infissi
|
7 |
- text: Melissa è una ragazza che adora
|
@@ -19,21 +20,17 @@ More precise versions will be published shortly.
|
|
19 |
Train on my server, i have studied and adapted the model starting from the repository https://github.com/karpathy/llama2.c
|
20 |
|
21 |
# max_seq_len: 7b = 2048: It represents the maximum sequence length for input data.
|
22 |
-
max_seq_len = 1024 #7b=2048
|
23 |
-
|
24 |
# dim 7b= 4096: This attribute represents the dimensionality of the model
|
25 |
-
dim = 768
|
26 |
-
|
27 |
# n_layers: 7b = 32: It specifies the number of layers in the model
|
28 |
-
n_layers = 32
|
29 |
-
|
30 |
# n_heads: 7b = 32: This attribute determines the number of attention heads in the model
|
31 |
-
n_heads = 32
|
32 |
-
|
33 |
# n_kv_heads: 7b = 32: It represents the number of key and value heads,
|
34 |
-
n_kv_heads = 32
|
35 |
-
|
36 |
# multiple_of: 7b = 256: It specifies a value used to make the SwiGLU hidden layer size a multiple of a large power of 2
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
multiple_of = 32
|
38 |
|
39 |
num decayed parameter tensors: 225, with 251,068,416 parameters
|
|
|
2 |
language:
|
3 |
- it
|
4 |
pipeline_tag: text-generation
|
5 |
+
max_length: 100
|
6 |
widget:
|
7 |
- text: Alessandro è un ragazzo che progetta Infissi
|
8 |
- text: Melissa è una ragazza che adora
|
|
|
20 |
Train on my server, i have studied and adapted the model starting from the repository https://github.com/karpathy/llama2.c
|
21 |
|
22 |
# max_seq_len: 7b = 2048: It represents the maximum sequence length for input data.
|
|
|
|
|
23 |
# dim 7b= 4096: This attribute represents the dimensionality of the model
|
|
|
|
|
24 |
# n_layers: 7b = 32: It specifies the number of layers in the model
|
|
|
|
|
25 |
# n_heads: 7b = 32: This attribute determines the number of attention heads in the model
|
|
|
|
|
26 |
# n_kv_heads: 7b = 32: It represents the number of key and value heads,
|
|
|
|
|
27 |
# multiple_of: 7b = 256: It specifies a value used to make the SwiGLU hidden layer size a multiple of a large power of 2
|
28 |
+
|
29 |
+
max_seq_len = 1024
|
30 |
+
dim = 768
|
31 |
+
n_layers = 32
|
32 |
+
n_heads = 32
|
33 |
+
n_kv_heads = 32
|
34 |
multiple_of = 32
|
35 |
|
36 |
num decayed parameter tensors: 225, with 251,068,416 parameters
|