Update README.md
Browse files
README.md
CHANGED
@@ -225,6 +225,24 @@ wip effort to make merging compatible llama model
|
|
225 |
|
226 |
## comparison to palmer-004
|
227 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
228 |
there is not differences between these models but for some reason i'm constantly facing this error when doing passthrough:
|
229 |
|
230 |
```
|
@@ -250,20 +268,4 @@ Traceback (most recent call last):
|
|
250 |
File "/teamspace/studios/this_studio/mergekit/mergekit/io/tasks.py", line 86, in execute
|
251 |
raise RuntimeError(
|
252 |
RuntimeError: Tensor lm_head.weight required but not present in model meta-llama/Llama-3.2-1B
|
253 |
-
```
|
254 |
-
|
255 |
-
| Component | palmer-004 | llama 3 1b | How to Make Second Similar to First |
|
256 |
-
|-----------|-------------|--------------|--------------------------------------|
|
257 |
-
| Total Layers | 22 (0 to 21) | 16 (0 to 15) | Add 6 more layers (16 to 21) with identical structure to existing layers |
|
258 |
-
| Embedding Layer | model.embed_tokens.weight | model.embed_tokens.weight | Already identical |
|
259 |
-
| Self-Attention Layers | 22 sets of (q_proj, k_proj, v_proj, o_proj) weights | 16 sets of (q_proj, k_proj, v_proj, o_proj) weights | Add 6 more sets of self-attention weights |
|
260 |
-
| MLP Layers | 22 sets of (gate_proj, up_proj, down_proj) weights | 16 sets of (gate_proj, up_proj, down_proj) weights | Add 6 more sets of MLP weights |
|
261 |
-
| Layer Normalization | 22 sets of (input_layernorm, post_attention_layernorm) weights | 16 sets of (input_layernorm, post_attention_layernorm) weights | Add 6 more sets of layer normalization weights |
|
262 |
-
| Final Normalization | model.norm.weight | model.norm.weight | Already identical |
|
263 |
-
| Language Model Head | lm_head.weight | lm_head.weight | Already identical |
|
264 |
-
| Layer Structure | Consistent across all 22 layers | Consistent across all 16 layers | Maintain the same structure when adding new layers |
|
265 |
-
| Hidden Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same hidden size |
|
266 |
-
| Attention Heads | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same number of attention heads |
|
267 |
-
| Intermediate MLP Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same intermediate MLP size |
|
268 |
-
| Position Embeddings | Not explicitly mentioned (might be part of embed_tokens) | Not explicitly mentioned (might be part of embed_tokens) | Ensure position embeddings support the maximum sequence length of the first model |
|
269 |
-
| Vocabulary Size | Determined by embed_tokens and lm_head dimensions | Determined by embed_tokens and lm_head dimensions | Already identical (assuming dimensions match) |
|
|
|
225 |
|
226 |
## comparison to palmer-004
|
227 |
|
228 |
+
| Component | palmer-004 | llama 3 1b | How to Make Second Similar to First |
|
229 |
+
|-----------|-------------|--------------|--------------------------------------|
|
230 |
+
| Total Layers | 22 (0 to 21) | 16 (0 to 15) | Add 6 more layers (16 to 21) with identical structure to existing layers |
|
231 |
+
| Embedding Layer | model.embed_tokens.weight | model.embed_tokens.weight | Already identical |
|
232 |
+
| Self-Attention Layers | 22 sets of (q_proj, k_proj, v_proj, o_proj) weights | 16 sets of (q_proj, k_proj, v_proj, o_proj) weights | Add 6 more sets of self-attention weights |
|
233 |
+
| MLP Layers | 22 sets of (gate_proj, up_proj, down_proj) weights | 16 sets of (gate_proj, up_proj, down_proj) weights | Add 6 more sets of MLP weights |
|
234 |
+
| Layer Normalization | 22 sets of (input_layernorm, post_attention_layernorm) weights | 16 sets of (input_layernorm, post_attention_layernorm) weights | Add 6 more sets of layer normalization weights |
|
235 |
+
| Final Normalization | model.norm.weight | model.norm.weight | Already identical |
|
236 |
+
| Language Model Head | lm_head.weight | lm_head.weight | Already identical |
|
237 |
+
| Layer Structure | Consistent across all 22 layers | Consistent across all 16 layers | Maintain the same structure when adding new layers |
|
238 |
+
| Hidden Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same hidden size |
|
239 |
+
| Attention Heads | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same number of attention heads |
|
240 |
+
| Intermediate MLP Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same intermediate MLP size |
|
241 |
+
| Position Embeddings | Not explicitly mentioned (might be part of embed_tokens) | Not explicitly mentioned (might be part of embed_tokens) | Ensure position embeddings support the maximum sequence length of the first model |
|
242 |
+
| Vocabulary Size | Determined by embed_tokens and lm_head dimensions | Determined by embed_tokens and lm_head dimensions | Already identical (assuming dimensions match) |
|
243 |
+
|
244 |
+
## further investigation
|
245 |
+
|
246 |
there is not differences between these models but for some reason i'm constantly facing this error when doing passthrough:
|
247 |
|
248 |
```
|
|
|
268 |
File "/teamspace/studios/this_studio/mergekit/mergekit/io/tasks.py", line 86, in execute
|
269 |
raise RuntimeError(
|
270 |
RuntimeError: Tensor lm_head.weight required but not present in model meta-llama/Llama-3.2-1B
|
271 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|