jukofyork
/

Dark-Miqu-103B

+---
+base_model: []
+library_name: transformers
+tags:
+- mergekit
+- merge
+license: other
+---
+![Dual-Miqu-103B.png](Dual-Miqu-103B.png)
+A creative writing `103b` parameter "self-merge" model with 32k context.
+# Model background
+Created using [Mergekit](https://github.com/arcee-ai/mergekit) from my [Dark-Miqu-70B](https://huggingface.co/jukofyork/Dark-Miqu-70B) model.
+- To fix problems with "backwards time skips" in the generated stories, the "standard" interleave pattern was replaced by repeated blocks (see [here](https://github.com/arcee-ai/mergekit/issues/198#issuecomment-2081174251)).
+- To help maintain cohesion, the '`q_proj`', '`k_proj`' and '`down_proj`' tensors were all scaled to hypothesised upper-bound values (see [here](https://github.com/arcee-ai/mergekit/issues/198#issuecomment-2063716974)).
+# Prompting format
+Vicuna format is preferred:
+```
+USER: {prompt} ASSISTANT:
+```
+Mistral and Alpaca formats are also supported:
+```
+[INST] {prompt} [/INST]
+```
+```
+### Instruction:
+{prompt}
+### Response:
+```
+# Licence and usage restrictions
+[miqu-1-70b-sf](https://huggingface.co/152334H/miqu-1-70b-sf) is a dequantized version of the [miqu-1-70b](https://huggingface.co/miqudev/miqu-1-70b) model leaked from MistralAI. All miqu-derived models, including this merge, are suitable for non-commercial, personal use only.
+# Mergekit configuration
+The following YAML configuration was used to produce this model:
+```yaml
+const_tag: &MODEL jukofyork/dark-miqu-70b
+const_tag: &QK_ATTENUATION_FACTOR 0.8408964153  # sqrt(sqrt(1/2))
+const_tag: &MLP_DOWN_SCALE_FACTOR 0.7071067812  # sqrt(1/2)
+scale-filter-env: &scale_filter_env
+  parameters:
+    scale:
+      - filter: q_proj
+        value: *QK_ATTENUATION_FACTOR
+      - filter: k_proj
+        value: *QK_ATTENUATION_FACTOR
+      - filter: down_proj
+        value: *MLP_DOWN_SCALE_FACTOR
+      - value: 1.0
+slices:
+  - sources:
+    - model: *MODEL
+      layer_range: [0, 20]
+  - sources:
+    - model: *MODEL
+      layer_range: [20, 40]
+      <<: *scale_filter_env
+  - sources:
+    - model: *MODEL
+      layer_range: [20, 40]
+      <<: *scale_filter_env
+  - sources:
+    - model: *MODEL
+      layer_range: [40, 60]
+      <<: *scale_filter_env
+  - sources:
+    - model: *MODEL
+      layer_range: [40, 60]
+      <<: *scale_filter_env
+  - sources:
+    - model: *MODEL
+      layer_range: [60, 80]
+merge_method: passthrough
+dtype: float16
+```
+## Key configuration details:
+- '`merge_method: passthrough`' passes input tensors through unmodified.
+- '`filter`' selects the required tensor(s) based on their name(s).
+- '`scale`' scales the weights in the select tensors.
+See the [Mergekit documentation](https://github.com/arcee-ai/mergekit) for more on these settings.
+# Example stories
+The following mix of "dark" stories were generated using the Vicuna prompt format with no system message and temperature=0:
+## Dark fantasy stories
+<details> <summary>Click to see spoiler</summary>
+</details>
+<details> <summary>Click to see spoiler</summary>
+</details>
+<details> <summary>Click to see spoiler</summary>
+</details>
+<details> <summary>Click to see spoiler</summary>
+</details>
+## Dark sci-fi stories
+<details> <summary>Click to see spoiler</summary>
+</details>
+<details> <summary>Click to see spoiler</summary>
+</details>
+<details> <summary>Click to see spoiler</summary>
+</details>
+<details> <summary>Click to see spoiler</summary>
+</details>
+## Miscellaneous stories
+<details> <summary>Click to see spoiler</summary>
+</details>
+<details> <summary>Click to see spoiler</summary>
+</details>
+<details> <summary>Click to see spoiler</summary>
+</details>
+<details> <summary>Click to see spoiler</summary>
+</details>
+Big thanks to @chargoddard for creating [Mergekit](https://github.com/arcee-ai/mergekit)!