appvoid
/

llama-3-1b

@@ -225,6 +225,24 @@ wip effort to make merging compatible llama model
 ## comparison to palmer-004
 there is not differences between these models but for some reason i'm constantly facing this error when doing passthrough:
 ```
@@ -250,20 +268,4 @@ Traceback (most recent call last):
   File "/teamspace/studios/this_studio/mergekit/mergekit/io/tasks.py", line 86, in execute
     raise RuntimeError(
 RuntimeError: Tensor lm_head.weight required but not present in model meta-llama/Llama-3.2-1B
-```
-| Component | palmer-004 | llama 3 1b | How to Make Second Similar to First |
-|-----------|-------------|--------------|--------------------------------------|
-| Total Layers | 22 (0 to 21) | 16 (0 to 15) | Add 6 more layers (16 to 21) with identical structure to existing layers |
-| Embedding Layer | model.embed_tokens.weight | model.embed_tokens.weight | Already identical |
-| Self-Attention Layers | 22 sets of (q_proj, k_proj, v_proj, o_proj) weights | 16 sets of (q_proj, k_proj, v_proj, o_proj) weights | Add 6 more sets of self-attention weights |
-| MLP Layers | 22 sets of (gate_proj, up_proj, down_proj) weights | 16 sets of (gate_proj, up_proj, down_proj) weights | Add 6 more sets of MLP weights |
-| Layer Normalization | 22 sets of (input_layernorm, post_attention_layernorm) weights | 16 sets of (input_layernorm, post_attention_layernorm) weights | Add 6 more sets of layer normalization weights |
-| Final Normalization | model.norm.weight | model.norm.weight | Already identical |
-| Language Model Head | lm_head.weight | lm_head.weight | Already identical |
-| Layer Structure | Consistent across all 22 layers | Consistent across all 16 layers | Maintain the same structure when adding new layers |
-| Hidden Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same hidden size |
-| Attention Heads | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same number of attention heads |
-| Intermediate MLP Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same intermediate MLP size |
-| Position Embeddings | Not explicitly mentioned (might be part of embed_tokens) | Not explicitly mentioned (might be part of embed_tokens) | Ensure position embeddings support the maximum sequence length of the first model |
-| Vocabulary Size | Determined by embed_tokens and lm_head dimensions | Determined by embed_tokens and lm_head dimensions | Already identical (assuming dimensions match) |

 ## comparison to palmer-004
+| Component | palmer-004 | llama 3 1b | How to Make Second Similar to First |
+|-----------|-------------|--------------|--------------------------------------|
+| Total Layers | 22 (0 to 21) | 16 (0 to 15) | Add 6 more layers (16 to 21) with identical structure to existing layers |
+| Embedding Layer | model.embed_tokens.weight | model.embed_tokens.weight | Already identical |
+| Self-Attention Layers | 22 sets of (q_proj, k_proj, v_proj, o_proj) weights | 16 sets of (q_proj, k_proj, v_proj, o_proj) weights | Add 6 more sets of self-attention weights |
+| MLP Layers | 22 sets of (gate_proj, up_proj, down_proj) weights | 16 sets of (gate_proj, up_proj, down_proj) weights | Add 6 more sets of MLP weights |
+| Layer Normalization | 22 sets of (input_layernorm, post_attention_layernorm) weights | 16 sets of (input_layernorm, post_attention_layernorm) weights | Add 6 more sets of layer normalization weights |
+| Final Normalization | model.norm.weight | model.norm.weight | Already identical |
+| Language Model Head | lm_head.weight | lm_head.weight | Already identical |
+| Layer Structure | Consistent across all 22 layers | Consistent across all 16 layers | Maintain the same structure when adding new layers |
+| Hidden Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same hidden size |
+| Attention Heads | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same number of attention heads |
+| Intermediate MLP Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same intermediate MLP size |
+| Position Embeddings | Not explicitly mentioned (might be part of embed_tokens) | Not explicitly mentioned (might be part of embed_tokens) | Ensure position embeddings support the maximum sequence length of the first model |
+| Vocabulary Size | Determined by embed_tokens and lm_head dimensions | Determined by embed_tokens and lm_head dimensions | Already identical (assuming dimensions match) |
+## further investigation
 there is not differences between these models but for some reason i'm constantly facing this error when doing passthrough:
 ```
   File "/teamspace/studios/this_studio/mergekit/mergekit/io/tasks.py", line 86, in execute
     raise RuntimeError(
 RuntimeError: Tensor lm_head.weight required but not present in model meta-llama/Llama-3.2-1B
+```