TheDrummer
commited on
Commit
•
cdaac82
1
Parent(s):
7f48392
Update README.md
Browse files
README.md
CHANGED
@@ -116,11 +116,11 @@ WIP
|
|
116 |
- Take note of a few things
|
117 |
- Top layers = Ending layers (nearer to output)
|
118 |
- Bottom layers = Starting layers (nearer to input)
|
119 |
-
- Training a non-upscaled model affects the top layers first and slowly descends to the bottom layers over time.
|
120 |
-
- Training an upscaled model with
|
121 |
-
-
|
122 |
-
- There's a 'ceiling value' for
|
123 |
-
- Even when Tunguska's duplicated
|
124 |
- Takeaways
|
125 |
- These slice of layers are more connected to each other than to the model's entirety.
|
126 |
- [Question] Does this mean that the **original layer** before the slice is the one holding that whole duplicated slice together?
|
|
|
116 |
- Take note of a few things
|
117 |
- Top layers = Ending layers (nearer to output)
|
118 |
- Bottom layers = Starting layers (nearer to input)
|
119 |
+
- Training a normal, non-upscaled model affects the top layers first and slowly descends to the bottom layers over time.
|
120 |
+
- Training an upscaled model with two slices of duplicate layers does two things:
|
121 |
+
- Each slice of duplicated layers has its own gradient.
|
122 |
+
- There's a 'ceiling value' for the duplicated layers in these slices.
|
123 |
+
- Even when Tunguska's slices of duplicated layers are nearly saturated, the resulting model remains coherent and even performant.
|
124 |
- Takeaways
|
125 |
- These slice of layers are more connected to each other than to the model's entirety.
|
126 |
- [Question] Does this mean that the **original layer** before the slice is the one holding that whole duplicated slice together?
|