TheDrummer commited on
Commit
5767fe7
1 Parent(s): a7d5987

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -6,7 +6,7 @@
6
  ## Conclusions (WIP)
7
  - Upscaling can 'provide room' for further training.
8
  - Training upscaled models will result in retaining more of the original model's performance & behavior.
9
- - (Not related to upscaling) The first two layers are sus - their weights are wildly different from the original. I wonder if we could recover smarts by merging that back in with base, or if those layers contain the most influence.
10
 
11
  ## What is the 39B Upscale?
12
 
 
6
  ## Conclusions (WIP)
7
  - Upscaling can 'provide room' for further training.
8
  - Training upscaled models will result in retaining more of the original model's performance & behavior.
9
+ - (Not related to upscaling) The first two layers are sus - their weights are wildly different from the original. I wonder if we could recover smarts by merging that back in with base, or if those layers contain the most influence and must be preserved.
10
 
11
  ## What is the 39B Upscale?
12