athirdpath
/

CleverGirl-20b-Blended-v1.1-DARE-GGUF

Model card Files Files and versions Community

CleverGirl-20b-Blended-v1.1-DARE-GGUF / README.md

athirdpath's picture

Create README.md

12e63cd 11 months ago

|

842 Bytes

	---
	license: cc-by-nc-4.0
	---
	### Description

	After I put down the joint and [RTFM](https://arxiv.org/pdf/2311.03099.pdf), I have a better idea exactly what's going on. I considered doing something similar with WANDA or SparseGPT a while back, but stopped when I ran into issues. Thus, I'm fascinated by this new method's execution.

	### Hypothesis

	By lowering the density, I hit closer to the sweet-spot shown in the paper. Also, I'm using my fixed base model, so hopefully that helps too. Weights are adjusted to make the later layers more aligned with ORCA 2.

	### Recipe
	merge_method: dare_ties

	- base_model: athirdpath/BigLlama-20b

	- model: athirdpath/CleverGirl-20b

	weight: 0.60 / density: 0.35

	- model: athirdpath/CleverGirl-20b-Inverted

	weight: 0.40 / density: 0.30

	int8_mask: true

	dtype: bfloat16