athirdpath commited on
Commit
12e63cd
1 Parent(s): 59441ce

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+ ### Description
5
+
6
+ After I put down the joint and [RTFM](https://arxiv.org/pdf/2311.03099.pdf), I have a better idea exactly what's going on. I considered doing something similar with WANDA or SparseGPT a while back, but stopped when I ran into issues. Thus, I'm fascinated by this new method's execution.
7
+
8
+ ### Hypothesis
9
+
10
+ By lowering the density, I hit closer to the sweet-spot shown in the paper. Also, I'm using my fixed base model, so hopefully that helps too. Weights are adjusted to make the later layers more aligned with ORCA 2.
11
+
12
+ ### Recipe
13
+ merge_method: dare_ties
14
+
15
+ - base_model: athirdpath/BigLlama-20b
16
+
17
+ - model: athirdpath/CleverGirl-20b
18
+
19
+ weight: 0.60 / density: 0.35
20
+
21
+ - model: athirdpath/CleverGirl-20b-Inverted
22
+
23
+ weight: 0.40 / density: 0.30
24
+
25
+ int8_mask: true
26
+
27
+ dtype: bfloat16