|
--- |
|
base_model: |
|
- 152334H/miqu-1-70b-sf |
|
- sophosympatheia/Midnight-Rose-70B-v2.0.3 |
|
- Sao10K/Euryale-1.3-L2-70B |
|
- Sao10K/WinterGoddess-1.4x-70B-L2 |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
license: other |
|
--- |
|
|
|
![Dusk-Miqu.png](Dusk-Miqu.png) |
|
|
|
A "dark" creative writing model with 32k context. Based off [miqu-1-70b](https://huggingface.co/miqudev/miqu-1-70b) but with greatly reduced "positivity" and "-isms". If you want happy endings, look elsewhere! |
|
|
|
This model **excels** at writing Dark/Grimdark fantasy (see examples below). |
|
|
|
# Model background |
|
|
|
This model is almost the same as [Dark-Miqu-70B](https://huggingface.co/jukofyork/Dark-Miqu-70B), but with @sophosympatheia's SLERP merge pattern: |
|
|
|
```yaml |
|
parameters: |
|
t: |
|
- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0] |
|
``` |
|
|
|
which creates this truncated triangular distribution: |
|
|
|
![Dark-Miqu-Distribution.png](https://cdn-uploads.huggingface.co/production/uploads/65995c45539c808e84c38bf1/guNF-5tEcKxeGVxyJcfCO.png) |
|
|
|
altered to use this truncated triangular distribution instead: |
|
|
|
![Dark-Miqu-Distribution-2.png](https://cdn-uploads.huggingface.co/production/uploads/65995c45539c808e84c38bf1/62P3bxJkTgw_-6gqpqG2g.png) |
|
|
|
This keeps the first 16 and last 16 layers unaltered (which ties in with what people have found for the frankenmerge interleave patterns), and potentially fixes the ["poor grammar"](https://huggingface.co/jukofyork/Dark-Miqu-70B/discussions/2) problem some people are having with [Dark-Miqu-70B](https://huggingface.co/jukofyork/Dark-Miqu-70B) (sadly I can't replicate this though...). |
|
|
|
Luckily this change also doesn't necessitate the recreation of the whole merge from scratch, and we can just use this: |
|
|
|
```yaml |
|
merge_method: linear |
|
parameters: |
|
weight: 1.0 |
|
slices: |
|
- sources: |
|
- model: 152334H/miqu-1-70b-sf |
|
layer_range: [0, 16] |
|
- model: jukofyork/dark-miqu-70b |
|
layer_range: [0, 16] |
|
parameters: |
|
weight: 0 |
|
- sources: |
|
- model: jukofyork/dark-miqu-70b |
|
layer_range: [16, 64] |
|
- sources: |
|
- model: 152334H/miqu-1-70b-sf |
|
layer_range: [64, 80] |
|
- model: jukofyork/dark-miqu-70b |
|
layer_range: [64, 80] |
|
parameters: |
|
weight: 0 |
|
dtype: float16 |
|
tokenizer_source: model:miqu-1-70b-sf |
|
``` |
|
|
|
# Prompting format |
|
|
|
Vicuna format is preferred: |
|
|
|
``` |
|
USER: {prompt} ASSISTANT: |
|
``` |
|
|
|
Mistral and Alpaca formats are also supported: |
|
|
|
``` |
|
[INST] {prompt} [/INST] |
|
``` |
|
|
|
``` |
|
### Instruction: |
|
{prompt} |
|
|
|
### Response: |
|
``` |
|
|
|
# Licence and usage restrictions |
|
|
|
[miqu-1-70b-sf](https://huggingface.co/152334H/miqu-1-70b-sf) is a dequantized version of the [miqu-1-70b](https://huggingface.co/miqudev/miqu-1-70b) model leaked from MistralAI. All miqu-derived models, including this merge, are suitable for non-commercial, personal use only. |
|
|
|
# Mergekit configuration |
|
|
|
The following YAML configuration was used to produce this model: |
|
|
|
```yaml |
|
name: midnight-miqu-70b |
|
models: |
|
- model: 152334H/miqu-1-70b-sf |
|
- model: sophosympatheia/Midnight-Rose-70B-v2.0.3 |
|
base_model: 152334H/miqu-1-70b-sf |
|
merge_method: slerp |
|
parameters: |
|
t: |
|
- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0] |
|
embed_slerp: true |
|
tokenizer_source: model:miqu-1-70b-sf |
|
dtype: float16 |
|
--- |
|
name: euryale-miqu-70b |
|
models: |
|
- model: 152334H/miqu-1-70b-sf |
|
- model: Sao10K/Euryale-1.3-L2-70B |
|
base_model: 152334H/miqu-1-70b-sf |
|
merge_method: slerp |
|
parameters: |
|
t: |
|
- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0] |
|
embed_slerp: true |
|
tokenizer_source: model:miqu-1-70b-sf |
|
dtype: float16 |
|
--- |
|
name: winter-miqu-70b |
|
models: |
|
- model: 152334H/miqu-1-70b-sf |
|
- model: Sao10K/WinterGoddess-1.4x-70B-L2 |
|
base_model: 152334H/miqu-1-70b-sf |
|
merge_method: slerp |
|
parameters: |
|
t: |
|
- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0] |
|
embed_slerp: true |
|
tokenizer_source: model:miqu-1-70b-sf |
|
dtype: float16 |
|
--- |
|
name: dark-miqu-70b |
|
models: |
|
- model: 152334H/miqu-1-70b-sf |
|
- model: midnight-miqu-70b |
|
- model: euryale-miqu-70b |
|
- model: winter-miqu-70b |
|
base_model: 152334H/miqu-1-70b-sf |
|
merge_method: model_stock |
|
dtype: float16 |
|
``` |
|
|
|
## Key configuration details: |
|
|
|
- '`merge_method: slerp`' uses spherical linear interpolation for merging models. |
|
- '`parameters: t`' controls the interpolation ratios between models. |
|
- '`embed_slerp: true`' applies slerp to the embedding layers. |
|
- '`merge_method: model_stock`' uses the '[Model Stock](https://arxiv.org/abs/2403.19522)' method. |
|
|
|
See the [Mergekit documentation](https://github.com/arcee-ai/mergekit) for more on these settings. |
|
|
|
**NOTE**: Run with `mergekit-mega` rather than `mergekit` as there are 4 documents in this one file. |
|
|
|
# Example stories |
|
|
|
The following mix of "dark" stories were generated using the Vicuna prompt format with no system message and temperature=0: |