athirdpath
/

CleverGirl-20b-Blended-v1.1-DARE

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

athirdpath commited on Nov 30, 2023

Commit

41141ef

•

1 Parent(s): fcc197c

Create README.md

Files changed (1) hide show

README.md +27 -0

README.md ADDED Viewed

	@@ -0,0 +1,27 @@

+---
+license: cc-by-nc-4.0
+---
+### Description
+After I [RTFM](https://arxiv.org/pdf/2311.03099.pdf), I have a better idea exactly what's going on. I considered doing something similar with WANDA or SparseGPT a while back, but stopped when I ran into issues. Thus, I'm fascinated by this new method's execution.
+### Hypothesis
+By lowering the density, I hit closer to the sweet-spot shown in the paper. Also, I'm using my fixed base model, so hopefully that helps too.
+### Recipe
+merge_method: dare_ties
+  - base_model: athirdpath/BigLlama-20b
+  - model: athirdpath/CleverGirl-20b
+      weight: 0.60 / density: 0.35
+  - model: athirdpath/CleverGirl-20b-Inverted
+      weight: 0.40 / density: 0.30
+int8_mask: true
+dtype: bfloat16