TheDrummer commited on
Commit
e1e3a7d
1 Parent(s): 6d8441e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -8,7 +8,7 @@ My cute attempt at being a siyantis :3 uwu ~
8
  ## Conclusions (WIP)
9
  - Upscaling can 'provide room' for further training.
10
  - Training upscaled models will result in retaining more of the original model's performance & behavior.
11
- - A 600MB dataset did not seem to completely fill the empty/duplicated layers. (Rate of change remained the same for epoch 1 & 2)
12
  - (Not related to upscaling) The first two layers are sus - their weights are wildly different from the original. I wonder if we could recover smarts by merging that back in with base, or if those layers contain the most influence and must be preserved.
13
 
14
  ## What is the 39B Upscale?
 
8
  ## Conclusions (WIP)
9
  - Upscaling can 'provide room' for further training.
10
  - Training upscaled models will result in retaining more of the original model's performance & behavior.
11
+ - A 600MB dataset was nowhere near in stabilizing the empty/duplicated layers. (Pertubed rate of change remained the same for epoch 1 & 2)
12
  - (Not related to upscaling) The first two layers are sus - their weights are wildly different from the original. I wonder if we could recover smarts by merging that back in with base, or if those layers contain the most influence and must be preserved.
13
 
14
  ## What is the 39B Upscale?