Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ I am currently in the process of cleaning up the code before publishing it, much
|
|
11 |
|
12 |
## Final merge composition
|
13 |
|
14 |
-
After processing 12 models my algorithm ended up with the following (approximated) final composition
|
15 |
|
16 |
| Model | Contribution |
|
17 |
|--------------------------|--------------|
|
@@ -28,6 +28,8 @@ After processing 12 models my algorithm ended up with the following (approximate
|
|
28 |
| Mistral-7B-v0.1 | 2% |
|
29 |
| Openchat_3.5 | 2% |
|
30 |
|
|
|
|
|
31 |
This new process only decides on the model's layers, not the singular lm_head and embed_tokens layers which influence much of the model's output. I ran a seperate script for that, picking the singular tensors that create the longest responses, which settled on Toppy-M-7B.
|
32 |
|
33 |
## Prompt Format
|
|
|
11 |
|
12 |
## Final merge composition
|
13 |
|
14 |
+
After processing 12 models my algorithm ended up with the following (approximated) final composition:
|
15 |
|
16 |
| Model | Contribution |
|
17 |
|--------------------------|--------------|
|
|
|
28 |
| Mistral-7B-v0.1 | 2% |
|
29 |
| Openchat_3.5 | 2% |
|
30 |
|
31 |
+
There is no real logic in how these models were divided throughout the merge - Small bits and pieces were taken from each and then mixed in with other models on a layer by layer basis, using a pattern similar to my MythoMax recipe in which underlying tensors are mixed in a criss-cross manner.
|
32 |
+
|
33 |
This new process only decides on the model's layers, not the singular lm_head and embed_tokens layers which influence much of the model's output. I ran a seperate script for that, picking the singular tensors that create the longest responses, which settled on Toppy-M-7B.
|
34 |
|
35 |
## Prompt Format
|