SanjiWatsuki
/

neural-chat-7b-v3-3-wizardmath-dare-me

Text Generation

text-generation-inference

Model card Files Files and versions Community

SanjiWatsuki commited on Dec 29, 2023

Commit

dda7081

·

1 Parent(s): 1d86275

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -6,6 +6,8 @@ tags:
 - merge
 ---
 This model is an experiment involving mixing DARE TIE merger with a task arithmetic merger to attempt to merge models with less loss.
 DARE TIE mergers are [very strong at transferring strengths](https://medium.com/@minh.hoque/paper-explained-language-models-are-super-mario-2ebce6c2cf35) while merging a minimal part of the model. For larger models, 90-99% of delta parameters from SFT models can be dropped while retaining most of the benefits if they are rescaled and consensus merged back into the model.

 - merge
 ---
+**Update: Yeah, this strategy doesn't work. This ended up really devastating the model's performance.**
 This model is an experiment involving mixing DARE TIE merger with a task arithmetic merger to attempt to merge models with less loss.
 DARE TIE mergers are [very strong at transferring strengths](https://medium.com/@minh.hoque/paper-explained-language-models-are-super-mario-2ebce6c2cf35) while merging a minimal part of the model. For larger models, 90-99% of delta parameters from SFT models can be dropped while retaining most of the benefits if they are rescaled and consensus merged back into the model.