TheDrummer
commited on
Commit
•
5767fe7
1
Parent(s):
a7d5987
Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@
|
|
6 |
## Conclusions (WIP)
|
7 |
- Upscaling can 'provide room' for further training.
|
8 |
- Training upscaled models will result in retaining more of the original model's performance & behavior.
|
9 |
-
- (Not related to upscaling) The first two layers are sus - their weights are wildly different from the original. I wonder if we could recover smarts by merging that back in with base, or if those layers contain the most influence.
|
10 |
|
11 |
## What is the 39B Upscale?
|
12 |
|
|
|
6 |
## Conclusions (WIP)
|
7 |
- Upscaling can 'provide room' for further training.
|
8 |
- Training upscaled models will result in retaining more of the original model's performance & behavior.
|
9 |
+
- (Not related to upscaling) The first two layers are sus - their weights are wildly different from the original. I wonder if we could recover smarts by merging that back in with base, or if those layers contain the most influence and must be preserved.
|
10 |
|
11 |
## What is the 39B Upscale?
|
12 |
|