TheDrummer
/

Tunguska-39B-v1-GGUF

GGUF

Inference Endpoints

conversational

Model card Files Files and versions Community

TheDrummer commited on Nov 19, 2024

Commit

8e48aac

verified ·

1 Parent(s): f98632f

Update README.md

Browse files

Files changed (1) hide show

README.md +78 -2

README.md CHANGED Viewed

@@ -124,7 +124,7 @@ WIP
 - Takeaways
   - These slice of layers are more connected to each other than to the model's entirety.
     - [Question] Does this mean that the **original layer** before the slice is the one holding that whole duplicated slice together?
-    - [Question] What if we interleave original and duplicate layers? Will that result in a more balanced, responsive upscale?
   - Saturating these duplicated layers MIGHT be a good goal to pursue.
 # Further Experimentation
@@ -145,4 +145,80 @@ Given how the duplicated layers seem to have a stabilizing effect, it begs the q
 ### We've so far hypothesized that training 'slowly fills' the duplicated layers. If we intentionally undercook, will the duplicated layers look *underfilled* or can you fill it up with a few steps? In other words, can a single/few updates to the model reconnect the duplicated layers?
 - Are we really repairing the 'neurons' step-by-step, or have they been significantly rearranged by the first (few?) steps?
-- Or maybe this is false given the top-bottom gradient observation.

 - Takeaways
   - These slice of layers are more connected to each other than to the model's entirety.
     - [Question] Does this mean that the **original layer** before the slice is the one holding that whole duplicated slice together?
+    - [Question] What if we interleave original and duplicate layers? Will that result in a more balanced, responsive upscale? (See Proposed Upscale Technique at the bottom)
   - Saturating these duplicated layers MIGHT be a good goal to pursue.
 # Further Experimentation
 ### We've so far hypothesized that training 'slowly fills' the duplicated layers. If we intentionally undercook, will the duplicated layers look *underfilled* or can you fill it up with a few steps? In other words, can a single/few updates to the model reconnect the duplicated layers?
 - Are we really repairing the 'neurons' step-by-step, or have they been significantly rearranged by the first (few?) steps?
+- Or maybe this is false given the top-bottom gradient observation.
+# Proposed Upscale Technique
+```yaml
+merge_method: passthrough
+slices:
+- sources:
+  - layer_range: [0, 18]
+    model: unsloth/Mistral-Small-Instruct-2409
+# Original L19
+- sources:
+  - layer_range: [19, 19]
+    model: unsloth/Mistral-Small-Instruct-2409
+# Dupe A of L19
+- sources:
+  - layer_range: [19, 19]
+    model: unsloth/Mistral-Small-Instruct-2409
+    parameters:
+      scale:
+      - filter: o_proj
+        value: 0.0
+      - filter: down_proj
+        value: 0.0
+      - value: 1.0
+# Dupe B of L19
+- sources:
+  - layer_range: [19, 19]
+    model: unsloth/Mistral-Small-Instruct-2409
+    parameters:
+      scale:
+      - filter: o_proj
+        value: 0.0
+      - filter: down_proj
+        value: 0.0
+      - value: 1.0
+# Original L20
+- sources:
+  - layer_range: [20, 20]
+    model: unsloth/Mistral-Small-Instruct-2409
+# Dupe A of L20
+- sources:
+  - layer_range: [20, 20]
+    model: unsloth/Mistral-Small-Instruct-2409
+    parameters:
+      scale:
+      - filter: o_proj
+        value: 0.0
+      - filter: down_proj
+        value: 0.0
+      - value: 1.0
+# Dupe B of L20
+- sources:
+  - layer_range: [20, 20]
+    model: unsloth/Mistral-Small-Instruct-2409
+    parameters:
+      scale:
+      - filter: o_proj
+        value: 0.0
+      - filter: down_proj
+        value: 0.0
+      - value: 1.0
+# ... REPEAT UNTIL 41
+- sources:
+  - layer_range: [41, 55]
+    model: unsloth/Mistral-Small-Instruct-2409
+```
+```
+O = original
+X = duplicate
+Previous Technique
+OOOOOOOOOOOXXOXXOXXOXXOXXOXXOXXOXXOXXOOOOOOOOOO
+OOOOOOOOOOOOOOOOOOXXXXXXXXXXXXXXXXXXXOOOOOOOOOO
+Proposed Technique
+OOOOOOOOOOOXXOXXOXXOXXOXXOXXOXXOXXOXXOOOOOOOOOO
+```