athirdpath commited on
Commit
41141ef
1 Parent(s): fcc197c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+ ### Description
5
+
6
+ After I [RTFM](https://arxiv.org/pdf/2311.03099.pdf), I have a better idea exactly what's going on. I considered doing something similar with WANDA or SparseGPT a while back, but stopped when I ran into issues. Thus, I'm fascinated by this new method's execution.
7
+
8
+ ### Hypothesis
9
+
10
+ By lowering the density, I hit closer to the sweet-spot shown in the paper. Also, I'm using my fixed base model, so hopefully that helps too.
11
+
12
+ ### Recipe
13
+ merge_method: dare_ties
14
+
15
+ - base_model: athirdpath/BigLlama-20b
16
+
17
+ - model: athirdpath/CleverGirl-20b
18
+
19
+ weight: 0.60 / density: 0.35
20
+
21
+ - model: athirdpath/CleverGirl-20b-Inverted
22
+
23
+ weight: 0.40 / density: 0.30
24
+
25
+ int8_mask: true
26
+
27
+ dtype: bfloat16