SanjiWatsuki
commited on
Commit
•
c443391
1
Parent(s):
9be1efd
Update README.md
Browse files
README.md
CHANGED
@@ -10,4 +10,29 @@ DARE TIE mergers are [very strong at transferring strengths](https://medium.com/
|
|
10 |
|
11 |
For 7B models, we can't drop as many of the parameters and retain the model's strengths. In the original paper, the WizardMath model showed transferrable skills when 90% of the parameters were dropped but showed more strength when 70% were dropped. Experimentally, it appears that [even lower drop rates like 40%](https://github.com/cg123/mergekit/issues/26) have performed the best even for larger 34B models. In some instances, [even densities as high as 80% create an unstable merger](https://huggingface.co/jan-hq/supermario-v1), making DARE TIES unsuitable for merging models.
|
12 |
|
13 |
-
This is an experiment utilizing two merger techniques together to try and transfer skills between finetuned models. If we were to DARE TIE a low density merger onto the base Mistral model and then task arithmetic merge those low density delta weights onto a finetune, could we still achieve skill transfer?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
For 7B models, we can't drop as many of the parameters and retain the model's strengths. In the original paper, the WizardMath model showed transferrable skills when 90% of the parameters were dropped but showed more strength when 70% were dropped. Experimentally, it appears that [even lower drop rates like 40%](https://github.com/cg123/mergekit/issues/26) have performed the best even for larger 34B models. In some instances, [even densities as high as 80% create an unstable merger](https://huggingface.co/jan-hq/supermario-v1), making DARE TIES unsuitable for merging models.
|
12 |
|
13 |
+
This is an experiment utilizing two merger techniques together to try and transfer skills between finetuned models. If we were to DARE TIE a low density merger onto the base Mistral model and then task arithmetic merge those low density delta weights onto a finetune, could we still achieve skill transfer?
|
14 |
+
|
15 |
+
```
|
16 |
+
models:
|
17 |
+
- model: mistralai/Mistral-7B-v0.1
|
18 |
+
# no parameters necessary for base model
|
19 |
+
- model: WizardLM/WizardMath-7B-V1.1
|
20 |
+
parameters:
|
21 |
+
weight: 1
|
22 |
+
density: 0.3
|
23 |
+
merge_method: dare_ties
|
24 |
+
base_model: mistralai/Mistral-7B-v0.1
|
25 |
+
parameters:
|
26 |
+
normalize: true
|
27 |
+
int8_mask: true
|
28 |
+
dtype: bfloat16
|
29 |
+
|
30 |
+
merge_method: task_arithmetic
|
31 |
+
base_model: mistralai/Mistral-7B-v0.1
|
32 |
+
models:
|
33 |
+
- model: C:\Users\sanji\Documents\Apps\text-generation-webui-main\models\mistral-wizardmath-dare-0.7
|
34 |
+
- model: Intel/neural-chat-7b-v3-3
|
35 |
+
parameters:
|
36 |
+
weight: 1.0
|
37 |
+
dtype: bfloat16
|
38 |
+
```
|