End of training

Browse files

Files changed (2) hide show

README.md +2 -24
logs/attn_weight=0, bf16=True, per_device_train_batch_size=4, run_name=bf16/events.out.tfevents.1726170813.1c1a426a2fee +3 -0

README.md CHANGED Viewed

@@ -107,28 +107,6 @@ LlamaForCausalLM(
          (self_attn): LlamaSdpaAttention(
            (q_proj): Linear(in_features=576, out_features=576, bias=False)
            (k_proj): Linear(in_features=576, out_features=192, bias=False)
-@@ -10,17 +10,16 @@
-           (o_proj): Linear(in_features=576, out_features=576, bias=False)
-           (rotary_emb): LlamaRotaryEmbedding()
-         )
--        (mlp): LlamaMLP(
-+        (mlp): LigerSwiGLUMLP(
-           (gate_proj): Linear(in_features=576, out_features=1536, bias=False)
-           (up_proj): Linear(in_features=576, out_features=1536, bias=False)
-           (down_proj): Linear(in_features=1536, out_features=576, bias=False)
--          (act_fn): SiLU()
-         )
--        (input_layernorm): LlamaRMSNorm((576,), eps=1e-05)
--        (post_attention_layernorm): LlamaRMSNorm((576,), eps=1e-05)
-+        (input_layernorm): LigerRMSNorm((576,), eps=1e-05, offset=0.0)
-+        (post_attention_layernorm): LigerRMSNorm((576,), eps=1e-05, offset=0.0)
-       )
-     )
--    (norm): LlamaRMSNorm((576,), eps=1e-05)
-+    (norm): LigerRMSNorm((576,), eps=1e-05, offset=0.0)
-     (rotary_emb): LlamaRotaryEmbedding()
-   )
-   (lm_head): Linear(in_features=576, out_features=49152, bias=False)
 ```
@@ -136,7 +114,7 @@ LlamaForCausalLM(
 <br/>
 # Train Dataset
-Trained on 44,060,170 tokens from the [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
 - Num Samples: `49,900`
 - Subset: `20231101.en`
@@ -185,7 +163,7 @@ The following hyperparameters were used during training:
         weight=0
     )
 )`
-- lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x7c610d513ac0>`
 - student_model_name_or_path: `None`
 - student_config_name_or_path: `None`
 - student_model_config: `{'num_hidden_layers': 15}`

          (self_attn): LlamaSdpaAttention(
            (q_proj): Linear(in_features=576, out_features=576, bias=False)
            (k_proj): Linear(in_features=576, out_features=192, bias=False)
 ```
 <br/>
 # Train Dataset
+Trained on 44,061,015 tokens from the [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
 - Num Samples: `49,900`
 - Subset: `20231101.en`
         weight=0
     )
 )`
+- lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x7c6117e3aad0>`
 - student_model_name_or_path: `None`
 - student_config_name_or_path: `None`
 - student_model_config: `{'num_hidden_layers': 15}`

logs/attn_weight=0, bf16=True, per_device_train_batch_size=4, run_name=bf16/events.out.tfevents.1726170813.1c1a426a2fee ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:89b005923b155722294ecdd76a75adae3712e0b8944137742eb912fba4f3e226
+size 249