README.md · athirdpath/CleverGirl-20b-Blended-v1.1-DARE at 1ea50bdff2e9bffd018987ad62b0f6af86f014f8

metadata

license: cc-by-nc-4.0

Description

After I put down the joint and RTFM, I have a better idea exactly what's going on. I considered doing something similar with WANDA or SparseGPT a while back, but stopped when I ran into issues. Thus, I'm fascinated by this new method's execution.

Hypothesis

By lowering the density, I hit closer to the sweet-spot shown in the paper. Also, I'm using my fixed base model, so hopefully that helps too.

Recipe

merge_method: dare_ties

base_model: athirdpath/BigLlama-20b
model: athirdpath/CleverGirl-20b

weight: 0.60 / density: 0.35
model: athirdpath/CleverGirl-20b-Inverted

weight: 0.40 / density: 0.30

int8_mask: true

dtype: bfloat16