update model to step 124484

Files changed (4) hide show

.ipynb_checkpoints/README-checkpoint.md CHANGED Viewed

@@ -2,6 +2,7 @@
 language:
 - it
 pipeline_tag: text-generation
 widget:
 - text: Alessandro è un ragazzo che progetta Infissi
 - text: Melissa è una ragazza che adora
@@ -10,31 +11,27 @@ tags:
 - italiano
 - llama
 ---
-This is a train starting from an empty model based exclusively on Italian language datasets (currently redpajama 2023-14 it)
-the train is ongoing and will extend to new datasets.
-More precise versions will be published shortly.
-Train on my server, i have studied and adapted the model starting from the repository https://github.com/karpathy/llama2.c
-# max_seq_len: 7b = 2048: It represents the maximum sequence length for input data.
-max_seq_len = 1024  #7b=2048
-# dim 7b= 4096: This attribute represents the dimensionality of the model
-dim         = 768
-# n_layers: 7b = 32: It specifies the number of layers in the model
-n_layers    = 32
-# n_heads: 7b = 32: This attribute determines the number of attention heads in the model
-n_heads     = 32
-# n_kv_heads: 7b = 32: It represents the number of key and value heads,
-n_kv_heads  = 32
-# multiple_of: 7b = 256: It specifies a value used to make the SwiGLU hidden layer size a multiple of a large power of 2
-multiple_of = 32
-num decayed parameter tensors: 225, with 251,068,416 parameters
-num non-decayed parameter tensors: 65, with 49,920 parameters

 language:
 - it
 pipeline_tag: text-generation
+max_length: 100
 widget:
 - text: Alessandro è un ragazzo che progetta Infissi
 - text: Melissa è una ragazza che adora
 - italiano
 - llama
 ---
+This is a train starting from an empty model based exclusively on Italian language datasets (currently redpajama 2023-14 it)<br/>
+<br/>
+the train is ongoing and will extend to new datasets.<br/>
+<br/>
+More precise versions will be published shortly.<br/>
+<br/>
+Train on my server, i have studied and adapted the model starting from the repository https://github.com/karpathy/llama2.c<br/>
+<br/>
+# max_seq_len: (7b = 2048) The maximum sequence length for input data.<br/>
+# dim (7b= 4096) Represents the dimensionality of the model<br/>
+# n_layers: (7b = 32) The number of layers in the model<br/>
+# n_heads: (7b = 32) Determines the number of attention heads in the model<br/>
+# n_kv_heads: (7b = 32) The number of key and value heads<br/>
+# multiple_of: (7b = 256) A value used to make the SwiGLU hidden layer size a multiple of a large power of 2<br/>
+<br/>
+max_seq_len = 1024<br/>
+dim         = 768<br/>
+n_layers    = 32<br/>
+n_heads     = 32<br/>
+n_kv_heads  = 32<br/>
+multiple_of = 32<br/>
+<br/>
+num decayed parameter tensors: 225, with 251,068,416 parameters<br/>
+num non-decayed parameter tensors: 65, with 49,920 parameters<br/>

.ipynb_checkpoints/config-checkpoint.json ADDED Viewed

+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "hidden_act": "silu",
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 2048,
+  "max_position_embeddings": 1024,
+  "model_type": "llama",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 32,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 10000.0,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.37.1",
+  "use_cache": true,
+  "vocab_size": 32000
+}

README.md CHANGED Viewed

@@ -11,27 +11,27 @@ tags:
 - italiano
 - llama
 ---
-This is a train starting from an empty model based exclusively on Italian language datasets (currently redpajama 2023-14 it)
-the train is ongoing and will extend to new datasets.
-More precise versions will be published shortly.
-Train on my server, i have studied and adapted the model starting from the repository https://github.com/karpathy/llama2.c
-# max_seq_len: 7b = 2048: It represents the maximum sequence length for input data.
-# dim 7b= 4096: This attribute represents the dimensionality of the model
-# n_layers: 7b = 32: It specifies the number of layers in the model
-# n_heads: 7b = 32: This attribute determines the number of attention heads in the model
-# n_kv_heads: 7b = 32: It represents the number of key and value heads,
-# multiple_of: 7b = 256: It specifies a value used to make the SwiGLU hidden layer size a multiple of a large power of 2
-max_seq_len = 1024
-dim         = 768
-n_layers    = 32
-n_heads     = 32
-n_kv_heads  = 32
-multiple_of = 32
-num decayed parameter tensors: 225, with 251,068,416 parameters
-num non-decayed parameter tensors: 65, with 49,920 parameters

 - italiano
 - llama
 ---
+This is a train starting from an empty model based exclusively on Italian language datasets (currently redpajama 2023-14 it)<br/>
+<br/>
+the train is ongoing and will extend to new datasets.<br/>
+<br/>
+More precise versions will be published shortly.<br/>
+<br/>
+Train on my server, i have studied and adapted the model starting from the repository https://github.com/karpathy/llama2.c<br/>
+<br/>
+# max_seq_len: (7b = 2048) The maximum sequence length for input data.<br/>
+# dim (7b= 4096) Represents the dimensionality of the model<br/>
+# n_layers: (7b = 32) The number of layers in the model<br/>
+# n_heads: (7b = 32) Determines the number of attention heads in the model<br/>
+# n_kv_heads: (7b = 32) The number of key and value heads<br/>
+# multiple_of: (7b = 256) A value used to make the SwiGLU hidden layer size a multiple of a large power of 2<br/>
+<br/>
+max_seq_len = 1024<br/>
+dim         = 768<br/>
+n_layers    = 32<br/>
+n_heads     = 32<br/>
+n_kv_heads  = 32<br/>
+multiple_of = 32<br/>
+<br/>
+num decayed parameter tensors: 225, with 251,068,416 parameters<br/>
+num non-decayed parameter tensors: 65, with 49,920 parameters<br/>

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0235d17ba1f3feecb09f3975964ced834af07144e40ac997e2a88efa30271a97
 size 1004567442

 version https://git-lfs.github.com/spec/v1
+oid sha256:ab62b69b46b7f795f22d07447f33fa985864f7fdd281df9a3d26834a1750744f
 size 1004567442