What exactly is the 0.85?

#1
by bartowski - opened

From reading the DARE paper, I'm not sure I understand what it is these models are

Are these a merge of the base model and something else? Or is it SFT? And with what dataset?

I am trying to use the DARE method to mitigate or eliminate the mutual interference between models when merging multiple homologous PEFT models, 0.85 means the drop rate in DARE. https://github.com/uukuguy/multi_loras#mixture-of-multi-loras

Are these meant to be used as-is or do they still need to be merged? Are you basically re-tuning these models with DARE?

My confusion is that I thought DARE was for merging or fine tuning, but don't know what this is a merge/tune of

As the paper said, we can "obtain new capabilities by assimilating the parameters of homologous models without the need for retraining or GPUs".
By drop the redundant delta parameters, it's possible to mitigate the mutual interference between merging models. What I want to do is try to verify this point. If the verification is successful, then I may have the possibility to merge multiple homologous models and maintain the prominent advantages of each model. And all of this does not require retraining the model, which is the most appealing aspect to me.

Sign up or log in to comment