matchaaaaa commited on
Commit
b467856
1 Parent(s): 0b68b50

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -20
README.md CHANGED
@@ -6,10 +6,35 @@ tags:
6
  - merge
7
 
8
  ---
 
 
9
  # Chaifighter Latte 14B
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  ## The Deets
 
13
  ### Mergekit
14
 
15
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
@@ -20,33 +45,114 @@ This model was merged using the passthrough merge method.
20
 
21
  ### Models Merged
22
 
23
- * [SanjiWatsuki/Kunoichi-7B]()
24
- * [Sao10K/Fimbulvetr-11B-v2]()
25
- * [Sao10K/Frostwind-v2.1-m7]()
26
- * [Gryphe/MythoMist-7b]()
27
 
28
 
29
- ### Configuration
30
 
31
  The following YAML configuration was used to produce this model:
32
 
33
  ```yaml
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  dtype: float32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  merge_method: passthrough
 
 
 
36
  slices:
37
- - sources:
38
- - layer_range: [0, 16]
39
- model: D:/MLnonsense/models/SanjiWatsuki_Kunoichi-7B
40
- - sources:
41
- - layer_range: [0, 8]
42
- model: mega\Kuno-Fimbul-splice
43
- - sources:
44
- - layer_range: [16, 32]
45
- model: D:/MLnonsense/models/Sao10K_Fimbulvetr-11B-v2
46
- - sources:
47
- - layer_range: [0, 8]
48
- model: mega\Fimbul-Frosty-Mytho-splice
49
- - sources:
50
- - layer_range: [16, 32]
51
- model: mega\Frosty-Mytho
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - merge
7
 
8
  ---
9
+ ![cute](https://huggingface.co/matchaaaaa/Chaifighter-Latte- 14B/resolve/main/chaifighter-latte-cute.png)
10
+
11
  # Chaifighter Latte 14B
12
 
13
+ Finally here, Chaifighter Latte is the successor to the Chaifighter 20B models. Like its predecessors, it is Mistral-based, but now it is dramatically reduced in size. Chaifighter Latte is formulated for creative, rich, verbose writing without sacrificing intelligence, awareness, and context-following abilities. Chaifighter Latte retains the great taste of the original, and despite being significantly lighter at 14 billion parameters, it performs even better. Try it for yourself!
14
+
15
+ ## Prompt Template: Alpaca
16
+
17
+ ```
18
+ Below is an instruction that describes a task. Write a response that appropriately completes the request.
19
+
20
+ ### Instruction:
21
+ {prompt}
22
+
23
+ ### Response:
24
+ ```
25
+
26
+ ## Recommended Settings: Universal-Light
27
+
28
+ Here are some settings ranges that tend to work for me. They aren't strict values, and there's a bit of leeway in them. Feel free to experiment a bit!
29
+
30
+ * Temperature: **1.0** *to* **1.25** (adjust to taste, but keep it low. Chaifighter is creative enough on its own)
31
+ * Min-P: **0.1** (increasing might help if it goes cuckoo, but I suggest keeping it there)
32
+ * Repetition Penalty: **1.05** *to* **1.1** (high values aren't needed and usually degrade output)
33
+ * Rep. Penalty Range: **256** *or* **512**
34
+ * *(all other samplers disabled)*
35
 
36
  ## The Deets
37
+
38
  ### Mergekit
39
 
40
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 
45
 
46
  ### Models Merged
47
 
48
+ * [SanjiWatsuki/Kunoichi-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-7B)
49
+ * [Sao10K/Fimbulvetr-11B-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2)
50
+ * [Sao10K/Frostwind-v2.1-m7](https://huggingface.co/Sao10K/Frostwind-v2.1-m7)
51
+ * [Gryphe/MythoMist-7b](https://huggingface.co/Gryphe/MythoMist-7b)
52
 
53
 
54
+ ### The Sauce
55
 
56
  The following YAML configuration was used to produce this model:
57
 
58
  ```yaml
59
+ slices:
60
+ - sources:
61
+ - model: SanjiWatsuki/Kunoichi-7B
62
+ layer_range: [16, 24]
63
+ merge_method: passthrough
64
+ dtype: float32
65
+ name: Kuno-splice
66
+ ---
67
+ slices:
68
+ - sources:
69
+ - model: Sao10K/Fimbulvetr-11B-v2
70
+ layer_range: [8, 16]
71
+ merge_method: passthrough
72
+ dtype: float32
73
+ name: Fimbul-splice
74
+ ---
75
+ models:
76
+ - model: Kuno-splice
77
+ parameters:
78
+ weight: [1, 1, 0.75, 0.625, 0.5, 0.375, 0.25, 0, 0] # 0.125 / 0.875 values removed here - "math gets screwy"
79
+ - model: Fimbul-splice
80
+ parameters:
81
+ weight: [0, 0, 0.25, 0.375, 0.5, 0.625, 0.75, 1, 1] # 0.125 / 0.875 values removed here - "math gets screwy"
82
+ merge_method: dare_linear # according to some paper, "DARE is all you need"
83
+ base_model: Kuno-splice
84
  dtype: float32
85
+ name: Kuno-Fimbul-splice
86
+ ---
87
+ models:
88
+ - model: Sao10K/Frostwind-v2.1-m7
89
+ - model: Gryphe/MythoMist-7b
90
+ parameters:
91
+ weight: 0.37
92
+ density: 0.8
93
+ merge_method: dare_ties
94
+ base_model: Sao10K/Frostwind-v2.1-m7
95
+ dtype: float32
96
+ name: Frosty-Mytho
97
+ ---
98
+ slices:
99
+ - sources:
100
+ - model: Sao10K/Fimbulvetr-11B-v2
101
+ layer_range: [32, 40]
102
  merge_method: passthrough
103
+ dtype: float32
104
+ name: Fimbul-splice-2
105
+ ---
106
  slices:
107
+ - sources:
108
+ - model: Frosty-Mytho
109
+ layer_range: [8, 16]
110
+ merge_method: passthrough
111
+ dtype: float32
112
+ name: Frosty-Mytho-splice
113
+ ---
114
+ models:
115
+ - model: Fimbul-splice-2
116
+ parameters:
117
+ weight: [1, 1, 0.75, 0.625, 0.5, 0.375, 0.25, 0, 0] # 0.125 / 0.875 values removed here - "math gets screwy"
118
+ - model: Frosty-Mytho-splice
119
+ parameters:
120
+ weight: [0, 0, 0.25, 0.375, 0.5, 0.625, 0.75, 1, 1] # 0.125 / 0.875 values removed here - "math gets screwy"
121
+ merge_method: dare_linear # according to some paper, "DARE is all you need"
122
+ base_model: Fimbul-splice-2
123
+ dtype: float32
124
+ name: Fimbul-Frosty-Mytho-splice
125
+ ---
126
+ slices:
127
+ - sources: # kunoichi
128
+ - model: SanjiWatsuki/Kunoichi-7B
129
+ layer_range: [0, 16]
130
+ - sources: # kunoichi gradient fimbul splice
131
+ - model: Kuno-Fimbul-splice
132
+ layer_range: [0, 8]
133
+ - sources: # fimbulvetr
134
+ - model: Sao10K/Fimbulvetr-11B-v2
135
+ layer_range: [16, 32]
136
+ # insert splice here
137
+ - sources: # fimbulvetr gradient fwmm splice
138
+ - model: Fimbul-Frosty-Mytho-splice
139
+ layer_range: [0, 8]
140
+ - sources: # frostwind + mythomist
141
+ - model: Frosty-Mytho
142
+ layer_range: [16, 32]
143
+ merge_method: passthrough
144
+ dtype: float32
145
+ name: Chaifighter-Latte-14B
146
  ```
147
+
148
+ ### The Thought Process
149
+
150
+ So, I wanted the first layers to be Kunoichi. Kunoichi was chosen for its strong context and instruct following abilities, as well as being a really smart model overall. Plus, it's not sloutch at RP. I think this is partly what gave previous Chaifighter models the awareness that many people liked. To best harness its stellar prompt processing performance, I put Kunoichi at the head of the stack.
151
+ Next, I applied a gradient merge that I call a "splice". Splicing models like this solves what I believe has significantly hurt the earlier Chaifighter models and many other frankenmerges, which is layer dissimilarity. Splicing the end of one stack from model A with the beginning of another stack of model B in theory helps smoothen over those differences and help bring everything together.
152
+ The second model I introduced is Fimbulvetr-v2. This should be no surprise, as it's also a well-established ingredient of the Chaifighter recipe. Boasting incredibly strong coherence, it is the glue that can hold a story together, even with multiple characters and over longer contexts. I felt like the best place for Fimbulvetr was right after Kunoichi.
153
+ Another splice.
154
+ Lastly, I picked Frostwind and MythoMist as the final layers in this merge. I wanted to introduce MythoMist into the merge as I felt like it was what gave Chaifighter its flavorful writing. I paired it with Frostwind, as it's a very creative writer as well, and I felt like the two (with more emphasis on Frostwind for consistency) produced high quality outputs up to my standards.
155
+
156
+
157
+
158
+ Thanks for looking at my model, and have a fantastic day! :)