SanjiWatsuki commited on
Commit
dda7081
1 Parent(s): 1d86275

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -6,6 +6,8 @@ tags:
6
  - merge
7
  ---
8
 
 
 
9
  This model is an experiment involving mixing DARE TIE merger with a task arithmetic merger to attempt to merge models with less loss.
10
 
11
  DARE TIE mergers are [very strong at transferring strengths](https://medium.com/@minh.hoque/paper-explained-language-models-are-super-mario-2ebce6c2cf35) while merging a minimal part of the model. For larger models, 90-99% of delta parameters from SFT models can be dropped while retaining most of the benefits if they are rescaled and consensus merged back into the model.
 
6
  - merge
7
  ---
8
 
9
+ **Update: Yeah, this strategy doesn't work. This ended up really devastating the model's performance.**
10
+
11
  This model is an experiment involving mixing DARE TIE merger with a task arithmetic merger to attempt to merge models with less loss.
12
 
13
  DARE TIE mergers are [very strong at transferring strengths](https://medium.com/@minh.hoque/paper-explained-language-models-are-super-mario-2ebce6c2cf35) while merging a minimal part of the model. For larger models, 90-99% of delta parameters from SFT models can be dropped while retaining most of the benefits if they are rescaled and consensus merged back into the model.