Rishabh Bhardwaj

RishabhBhardwaj

AI & ML interests

None yet

Organizations

Posts 1

view post
Post
2227
🎉 We are thrilled to share our work on model merging. We proposed a new approach, Della-merging, which combines expert models from various domains into a single, versatile model. Della employs a magnitude-based sampling approach to eliminate redundant delta parameters, reducing interference when merging homologous models (those fine-tuned from the same backbone).

Della outperforms existing homologous model merging techniques such as DARE and TIES. Across three expert models (LM, Math, Code) and their corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), Della achieves an improvement of 3.6 points over TIES and 1.2 points over DARE.

Paper: DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling (2406.11617)
Github: https://github.com/declare-lab/della

@soujanyaporia @Tej3

models

None public yet

datasets

None public yet