BeaverAI
/

Tunguska-39B-v1b-GGUF

Inference Endpoints

Model card Files Files and versions Community

TheDrummer commited on 7 days ago

Commit

cdaac82

•

1 Parent(s): 7f48392

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -116,11 +116,11 @@ WIP
 - Take note of a few things
   - Top layers = Ending layers (nearer to output)
   - Bottom layers = Starting layers (nearer to input)
-  - Training a non-upscaled model affects the top layers first and slowly descends to the bottom layers over time.
-  - Training an upscaled model with a slice of layers duplicated twice does two things:
-    - The duplicated slices EACH have their own gradient.
-    - There's a 'ceiling value' for each of these duplicated slices.
-    - Even when Tunguska's duplicated slices are nearly saturated, the resulting model remains coherent and even performant.
 - Takeaways
   - These slice of layers are more connected to each other than to the model's entirety.
     - [Question] Does this mean that the **original layer** before the slice is the one holding that whole duplicated slice together?

 - Take note of a few things
   - Top layers = Ending layers (nearer to output)
   - Bottom layers = Starting layers (nearer to input)
+  - Training a normal, non-upscaled model affects the top layers first and slowly descends to the bottom layers over time.
+  - Training an upscaled model with two slices of duplicate layers does two things:
+    - Each slice of duplicated layers has its own gradient.
+    - There's a 'ceiling value' for the duplicated layers in these slices.
+    - Even when Tunguska's slices of duplicated layers are nearly saturated, the resulting model remains coherent and even performant.
 - Takeaways
   - These slice of layers are more connected to each other than to the model's entirety.
     - [Question] Does this mean that the **original layer** before the slice is the one holding that whole duplicated slice together?