mergesloppa123123 commited on
Commit
81b441f
1 Parent(s): bd35e29

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -3
README.md CHANGED
@@ -1,3 +1,62 @@
1
- ---
2
- license: llama3.1
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ base_model:
4
+ - NousResearch/Hermes-3-Llama-3.1-70B
5
+ - Sao10K/L3.1-70B-Euryale-v2.2
6
+ library_name: transformers
7
+ tags:
8
+ - mergekit
9
+ - merge
10
+ ---
11
+ # Hermyale-stack-90B
12
+ It's a stack merge meme model made from Hermes 3 and Euryale 2.2. You should use it for roleplay and creative writing AND PROBABLY NOTHING ELSE (but hey, it's your funeral)
13
+
14
+ ## STACK MERGE DISCLAIMER
15
+ yes it's just a stack merge, no I didn't do any additional pretraining, no stack merges don't make the model smarter, yes they harm its ability to do complex logical tasks, yes they introduce some weird behaviors and unexpected mistakes, no they don't make the model sentient, no you shouldn't post on twitter about how adding a few layers turned it into agi, etc. etc.
16
+
17
+ That said, it does feel unique and fun to use. If you're the type of person who's drowning in VRAM would rather have some more variety at the expense of needing to make a few manual edits to clean up mistakes, give it a try.
18
+
19
+ ## Format
20
+ ChatML
21
+
22
+ ## Samplers
23
+ Because stack merges introduce some unexpected noise to the model, I recommend higher min p than normal. I've been getting good results with min_p 0.1 -> temp 1.1 (I usually prefer something like min_p 0.03-0.05 -> temp 0.7-0.9, adjust according to taste). Add your favorite anti-repetition sampler as needed.
24
+
25
+ ## Merge Rationale
26
+ The config is below:
27
+ ```
28
+ slices:
29
+ - sources:
30
+ - model: ../Hermes-3-Llama-3.1-70B
31
+ layer_range: [0, 21]
32
+ - sources:
33
+ - model: ../L3.1-70B-Euryale-v2.2
34
+ layer_range: [16, 36]
35
+ - sources:
36
+ - model: ../Hermes-3-Llama-3.1-70B
37
+ layer_range: [30, 50]
38
+ - sources:
39
+ - model: ../L3.1-70B-Euryale-v2.2
40
+ layer_range: [40, 64]
41
+ - sources:
42
+ - model: ../Hermes-3-Llama-3.1-70B
43
+ layer_range: [60, 80]
44
+
45
+ tokenizer_source: ../Hermes-3-Llama-3.1-70B
46
+ merge_method: passthrough
47
+ dtype: bfloat16
48
+ ```
49
+ You will notice these layer ranges are less uniform and logically-spaced than most stack merges. That's because I tested a bunch of versions of this with different slices to see which ones felt best before eventually arriving at this config, which seems to strike the best balance between fun creativity, coherence, and size.
50
+
51
+ Random theorizing:
52
+ - Including a huge number of layers from both models is not necessary to achieve the beneficial aspects of stack merges (e.g. I don't think a 120B variant of this would be any better)
53
+ - I avoid taking tons of smaller slices as some amount of continuity is beneficial, but too much and you aren't getting that stack merge je ne sais quoi.
54
+ - I try to focus most of the layer overlap on the middle layers, as these seem to be the most composable and have the best bang for your buck when stack merging.
55
+ - Messing too much with the later layers seems to be almost entirely pointless.
56
+ - Messing too much with the early layers can be very detrimental to coherence (but also, letting a single model get too far through its early layers uninterrupted can reduce the Fun Factor of the resulting merge)
57
+
58
+ ---
59
+
60
+ All credit goes to the original finetuners, I'm just some dummy who can write mergekit configs.
61
+
62
+ :*