jukofyork commited on
Commit
e9ed5e8
1 Parent(s): c98a357

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +162 -0
README.md ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: []
3
+ library_name: transformers
4
+ tags:
5
+ - mergekit
6
+ - merge
7
+ license: other
8
+ ---
9
+
10
+ ![Dual-Miqu-103B.png](Dual-Miqu-103B.png)
11
+
12
+ A creative writing `103b` parameter "self-merge" model with 32k context.
13
+
14
+ # Model background
15
+
16
+ Created using [Mergekit](https://github.com/arcee-ai/mergekit) from my [Dark-Miqu-70B](https://huggingface.co/jukofyork/Dark-Miqu-70B) model.
17
+
18
+ - To fix problems with "backwards time skips" in the generated stories, the "standard" interleave pattern was replaced by repeated blocks (see [here](https://github.com/arcee-ai/mergekit/issues/198#issuecomment-2081174251)).
19
+ - To help maintain cohesion, the '`q_proj`', '`k_proj`' and '`down_proj`' tensors were all scaled to hypothesised upper-bound values (see [here](https://github.com/arcee-ai/mergekit/issues/198#issuecomment-2063716974)).
20
+
21
+ # Prompting format
22
+
23
+ Vicuna format is preferred:
24
+
25
+ ```
26
+ USER: {prompt} ASSISTANT:
27
+ ```
28
+
29
+ Mistral and Alpaca formats are also supported:
30
+
31
+ ```
32
+ [INST] {prompt} [/INST]
33
+ ```
34
+
35
+ ```
36
+ ### Instruction:
37
+ {prompt}
38
+
39
+ ### Response:
40
+ ```
41
+
42
+ # Licence and usage restrictions
43
+
44
+ [miqu-1-70b-sf](https://huggingface.co/152334H/miqu-1-70b-sf) is a dequantized version of the [miqu-1-70b](https://huggingface.co/miqudev/miqu-1-70b) model leaked from MistralAI. All miqu-derived models, including this merge, are suitable for non-commercial, personal use only.
45
+
46
+ # Mergekit configuration
47
+
48
+ The following YAML configuration was used to produce this model:
49
+
50
+ ```yaml
51
+ const_tag: &MODEL jukofyork/dark-miqu-70b
52
+
53
+ const_tag: &QK_ATTENUATION_FACTOR 0.8408964153 # sqrt(sqrt(1/2))
54
+ const_tag: &MLP_DOWN_SCALE_FACTOR 0.7071067812 # sqrt(1/2)
55
+
56
+ scale-filter-env: &scale_filter_env
57
+ parameters:
58
+ scale:
59
+ - filter: q_proj
60
+ value: *QK_ATTENUATION_FACTOR
61
+ - filter: k_proj
62
+ value: *QK_ATTENUATION_FACTOR
63
+ - filter: down_proj
64
+ value: *MLP_DOWN_SCALE_FACTOR
65
+ - value: 1.0
66
+
67
+
68
+ slices:
69
+ - sources:
70
+ - model: *MODEL
71
+ layer_range: [0, 20]
72
+ - sources:
73
+ - model: *MODEL
74
+ layer_range: [20, 40]
75
+ <<: *scale_filter_env
76
+ - sources:
77
+ - model: *MODEL
78
+ layer_range: [20, 40]
79
+ <<: *scale_filter_env
80
+ - sources:
81
+ - model: *MODEL
82
+ layer_range: [40, 60]
83
+ <<: *scale_filter_env
84
+ - sources:
85
+ - model: *MODEL
86
+ layer_range: [40, 60]
87
+ <<: *scale_filter_env
88
+ - sources:
89
+ - model: *MODEL
90
+ layer_range: [60, 80]
91
+
92
+ merge_method: passthrough
93
+ dtype: float16
94
+ ```
95
+
96
+ ## Key configuration details:
97
+
98
+ - '`merge_method: passthrough`' passes input tensors through unmodified.
99
+ - '`filter`' selects the required tensor(s) based on their name(s).
100
+ - '`scale`' scales the weights in the select tensors.
101
+
102
+ See the [Mergekit documentation](https://github.com/arcee-ai/mergekit) for more on these settings.
103
+
104
+ # Example stories
105
+
106
+ The following mix of "dark" stories were generated using the Vicuna prompt format with no system message and temperature=0:
107
+
108
+ ## Dark fantasy stories
109
+
110
+ <details> <summary>Click to see spoiler</summary>
111
+
112
+ </details>
113
+
114
+ <details> <summary>Click to see spoiler</summary>
115
+
116
+ </details>
117
+
118
+ <details> <summary>Click to see spoiler</summary>
119
+
120
+ </details>
121
+
122
+ <details> <summary>Click to see spoiler</summary>
123
+
124
+ </details>
125
+
126
+ ## Dark sci-fi stories
127
+
128
+ <details> <summary>Click to see spoiler</summary>
129
+
130
+ </details>
131
+
132
+ <details> <summary>Click to see spoiler</summary>
133
+
134
+ </details>
135
+
136
+ <details> <summary>Click to see spoiler</summary>
137
+
138
+ </details>
139
+
140
+ <details> <summary>Click to see spoiler</summary>
141
+
142
+ </details>
143
+
144
+ ## Miscellaneous stories
145
+
146
+ <details> <summary>Click to see spoiler</summary>
147
+
148
+ </details>
149
+
150
+ <details> <summary>Click to see spoiler</summary>
151
+
152
+ </details>
153
+
154
+ <details> <summary>Click to see spoiler</summary>
155
+
156
+ </details>
157
+
158
+ <details> <summary>Click to see spoiler</summary>
159
+
160
+ </details>
161
+
162
+ Big thanks to @chargoddard for creating [Mergekit](https://github.com/arcee-ai/mergekit)!