|
--- |
|
license: cc-by-nc-4.0 |
|
--- |
|
### Description |
|
|
|
After I put down the joint and [RTFM](https://arxiv.org/pdf/2311.03099.pdf), I have a better idea exactly what's going on. I considered doing something similar with WANDA or SparseGPT a while back, but stopped when I ran into issues. Thus, I'm fascinated by this new method's execution. |
|
|
|
### Hypothesis |
|
|
|
By lowering the density, I hit closer to the sweet-spot shown in the paper. Also, I'm using my fixed base model, so hopefully that helps too. Weights are adjusted to make the later layers more aligned with ORCA 2. |
|
|
|
### Recipe |
|
merge_method: dare_ties |
|
|
|
- base_model: athirdpath/BigLlama-20b |
|
|
|
- model: athirdpath/CleverGirl-20b |
|
|
|
weight: 0.60 / density: 0.35 |
|
|
|
- model: athirdpath/CleverGirl-20b-Inverted |
|
|
|
weight: 0.40 / density: 0.30 |
|
|
|
int8_mask: true |
|
|
|
dtype: bfloat16 |