appvoid commited on
Commit
8511d9d
1 Parent(s): 22cdfc4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -17
README.md CHANGED
@@ -225,6 +225,24 @@ wip effort to make merging compatible llama model
225
 
226
  ## comparison to palmer-004
227
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
228
  there is not differences between these models but for some reason i'm constantly facing this error when doing passthrough:
229
 
230
  ```
@@ -250,20 +268,4 @@ Traceback (most recent call last):
250
  File "/teamspace/studios/this_studio/mergekit/mergekit/io/tasks.py", line 86, in execute
251
  raise RuntimeError(
252
  RuntimeError: Tensor lm_head.weight required but not present in model meta-llama/Llama-3.2-1B
253
- ```
254
-
255
- | Component | palmer-004 | llama 3 1b | How to Make Second Similar to First |
256
- |-----------|-------------|--------------|--------------------------------------|
257
- | Total Layers | 22 (0 to 21) | 16 (0 to 15) | Add 6 more layers (16 to 21) with identical structure to existing layers |
258
- | Embedding Layer | model.embed_tokens.weight | model.embed_tokens.weight | Already identical |
259
- | Self-Attention Layers | 22 sets of (q_proj, k_proj, v_proj, o_proj) weights | 16 sets of (q_proj, k_proj, v_proj, o_proj) weights | Add 6 more sets of self-attention weights |
260
- | MLP Layers | 22 sets of (gate_proj, up_proj, down_proj) weights | 16 sets of (gate_proj, up_proj, down_proj) weights | Add 6 more sets of MLP weights |
261
- | Layer Normalization | 22 sets of (input_layernorm, post_attention_layernorm) weights | 16 sets of (input_layernorm, post_attention_layernorm) weights | Add 6 more sets of layer normalization weights |
262
- | Final Normalization | model.norm.weight | model.norm.weight | Already identical |
263
- | Language Model Head | lm_head.weight | lm_head.weight | Already identical |
264
- | Layer Structure | Consistent across all 22 layers | Consistent across all 16 layers | Maintain the same structure when adding new layers |
265
- | Hidden Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same hidden size |
266
- | Attention Heads | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same number of attention heads |
267
- | Intermediate MLP Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same intermediate MLP size |
268
- | Position Embeddings | Not explicitly mentioned (might be part of embed_tokens) | Not explicitly mentioned (might be part of embed_tokens) | Ensure position embeddings support the maximum sequence length of the first model |
269
- | Vocabulary Size | Determined by embed_tokens and lm_head dimensions | Determined by embed_tokens and lm_head dimensions | Already identical (assuming dimensions match) |
 
225
 
226
  ## comparison to palmer-004
227
 
228
+ | Component | palmer-004 | llama 3 1b | How to Make Second Similar to First |
229
+ |-----------|-------------|--------------|--------------------------------------|
230
+ | Total Layers | 22 (0 to 21) | 16 (0 to 15) | Add 6 more layers (16 to 21) with identical structure to existing layers |
231
+ | Embedding Layer | model.embed_tokens.weight | model.embed_tokens.weight | Already identical |
232
+ | Self-Attention Layers | 22 sets of (q_proj, k_proj, v_proj, o_proj) weights | 16 sets of (q_proj, k_proj, v_proj, o_proj) weights | Add 6 more sets of self-attention weights |
233
+ | MLP Layers | 22 sets of (gate_proj, up_proj, down_proj) weights | 16 sets of (gate_proj, up_proj, down_proj) weights | Add 6 more sets of MLP weights |
234
+ | Layer Normalization | 22 sets of (input_layernorm, post_attention_layernorm) weights | 16 sets of (input_layernorm, post_attention_layernorm) weights | Add 6 more sets of layer normalization weights |
235
+ | Final Normalization | model.norm.weight | model.norm.weight | Already identical |
236
+ | Language Model Head | lm_head.weight | lm_head.weight | Already identical |
237
+ | Layer Structure | Consistent across all 22 layers | Consistent across all 16 layers | Maintain the same structure when adding new layers |
238
+ | Hidden Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same hidden size |
239
+ | Attention Heads | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same number of attention heads |
240
+ | Intermediate MLP Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same intermediate MLP size |
241
+ | Position Embeddings | Not explicitly mentioned (might be part of embed_tokens) | Not explicitly mentioned (might be part of embed_tokens) | Ensure position embeddings support the maximum sequence length of the first model |
242
+ | Vocabulary Size | Determined by embed_tokens and lm_head dimensions | Determined by embed_tokens and lm_head dimensions | Already identical (assuming dimensions match) |
243
+
244
+ ## further investigation
245
+
246
  there is not differences between these models but for some reason i'm constantly facing this error when doing passthrough:
247
 
248
  ```
 
268
  File "/teamspace/studios/this_studio/mergekit/mergekit/io/tasks.py", line 86, in execute
269
  raise RuntimeError(
270
  RuntimeError: Tensor lm_head.weight required but not present in model meta-llama/Llama-3.2-1B
271
+ ```