TheDrummer
commited on
Commit
•
e1e3a7d
1
Parent(s):
6d8441e
Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ My cute attempt at being a siyantis :3 uwu ~
|
|
8 |
## Conclusions (WIP)
|
9 |
- Upscaling can 'provide room' for further training.
|
10 |
- Training upscaled models will result in retaining more of the original model's performance & behavior.
|
11 |
-
- A 600MB dataset
|
12 |
- (Not related to upscaling) The first two layers are sus - their weights are wildly different from the original. I wonder if we could recover smarts by merging that back in with base, or if those layers contain the most influence and must be preserved.
|
13 |
|
14 |
## What is the 39B Upscale?
|
|
|
8 |
## Conclusions (WIP)
|
9 |
- Upscaling can 'provide room' for further training.
|
10 |
- Training upscaled models will result in retaining more of the original model's performance & behavior.
|
11 |
+
- A 600MB dataset was nowhere near in stabilizing the empty/duplicated layers. (Pertubed rate of change remained the same for epoch 1 & 2)
|
12 |
- (Not related to upscaling) The first two layers are sus - their weights are wildly different from the original. I wonder if we could recover smarts by merging that back in with base, or if those layers contain the most influence and must be preserved.
|
13 |
|
14 |
## What is the 39B Upscale?
|