Dusk-Miqu-70B / README.md
jukofyork's picture
Create README.md
a125e52 verified
|
raw
history blame
No virus
4.78 kB
---
base_model:
- 152334H/miqu-1-70b-sf
- sophosympatheia/Midnight-Rose-70B-v2.0.3
- Sao10K/Euryale-1.3-L2-70B
- Sao10K/WinterGoddess-1.4x-70B-L2
library_name: transformers
tags:
- mergekit
- merge
license: other
---
![Dusk-Miqu.png](Dusk-Miqu.png)
A "dark" creative writing model with 32k context. Based off [miqu-1-70b](https://huggingface.co/miqudev/miqu-1-70b) but with greatly reduced "positivity" and "-isms". If you want happy endings, look elsewhere!
This model **excels** at writing Dark/Grimdark fantasy (see examples below).
# Model background
This model is almost the same as [Dark-Miqu-70B](https://huggingface.co/jukofyork/Dark-Miqu-70B), but with @sophosympatheia's SLERP merge pattern:
```yaml
parameters:
t:
- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0]
```
which creates this truncated triangular distribution:
![Dark-Miqu-Distribution.png](https://cdn-uploads.huggingface.co/production/uploads/65995c45539c808e84c38bf1/guNF-5tEcKxeGVxyJcfCO.png)
altered to use this truncated triangular distribution instead:
![Dark-Miqu-Distribution-2.png](https://cdn-uploads.huggingface.co/production/uploads/65995c45539c808e84c38bf1/62P3bxJkTgw_-6gqpqG2g.png)
This keeps the first 16 and last 16 layers unaltered (which ties in with what people have found for the frankenmerge interleave patterns), and potentially fixes the ["poor grammar"](https://huggingface.co/jukofyork/Dark-Miqu-70B/discussions/2) problem some people are having with [Dark-Miqu-70B](https://huggingface.co/jukofyork/Dark-Miqu-70B) (sadly I can't replicate this though...).
Luckily this change also doesn't necessitate the recreation of the whole merge from scratch, and we can just use this:
```yaml
merge_method: linear
parameters:
weight: 1.0
slices:
- sources:
- model: 152334H/miqu-1-70b-sf
layer_range: [0, 16]
- model: jukofyork/dark-miqu-70b
layer_range: [0, 16]
parameters:
weight: 0
- sources:
- model: jukofyork/dark-miqu-70b
layer_range: [16, 64]
- sources:
- model: 152334H/miqu-1-70b-sf
layer_range: [64, 80]
- model: jukofyork/dark-miqu-70b
layer_range: [64, 80]
parameters:
weight: 0
dtype: float16
tokenizer_source: model:miqu-1-70b-sf
```
# Prompting format
Vicuna format is preferred:
```
USER: {prompt} ASSISTANT:
```
Mistral and Alpaca formats are also supported:
```
[INST] {prompt} [/INST]
```
```
### Instruction:
{prompt}
### Response:
```
# Licence and usage restrictions
[miqu-1-70b-sf](https://huggingface.co/152334H/miqu-1-70b-sf) is a dequantized version of the [miqu-1-70b](https://huggingface.co/miqudev/miqu-1-70b) model leaked from MistralAI. All miqu-derived models, including this merge, are suitable for non-commercial, personal use only.
# Mergekit configuration
The following YAML configuration was used to produce this model:
```yaml
name: midnight-miqu-70b
models:
- model: 152334H/miqu-1-70b-sf
- model: sophosympatheia/Midnight-Rose-70B-v2.0.3
base_model: 152334H/miqu-1-70b-sf
merge_method: slerp
parameters:
t:
- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0]
embed_slerp: true
tokenizer_source: model:miqu-1-70b-sf
dtype: float16
---
name: euryale-miqu-70b
models:
- model: 152334H/miqu-1-70b-sf
- model: Sao10K/Euryale-1.3-L2-70B
base_model: 152334H/miqu-1-70b-sf
merge_method: slerp
parameters:
t:
- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0]
embed_slerp: true
tokenizer_source: model:miqu-1-70b-sf
dtype: float16
---
name: winter-miqu-70b
models:
- model: 152334H/miqu-1-70b-sf
- model: Sao10K/WinterGoddess-1.4x-70B-L2
base_model: 152334H/miqu-1-70b-sf
merge_method: slerp
parameters:
t:
- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0]
embed_slerp: true
tokenizer_source: model:miqu-1-70b-sf
dtype: float16
---
name: dark-miqu-70b
models:
- model: 152334H/miqu-1-70b-sf
- model: midnight-miqu-70b
- model: euryale-miqu-70b
- model: winter-miqu-70b
base_model: 152334H/miqu-1-70b-sf
merge_method: model_stock
dtype: float16
```
## Key configuration details:
- '`merge_method: slerp`' uses spherical linear interpolation for merging models.
- '`parameters: t`' controls the interpolation ratios between models.
- '`embed_slerp: true`' applies slerp to the embedding layers.
- '`merge_method: model_stock`' uses the '[Model Stock](https://arxiv.org/abs/2403.19522)' method.
See the [Mergekit documentation](https://github.com/arcee-ai/mergekit) for more on these settings.
**NOTE**: Run with `mergekit-mega` rather than `mergekit` as there are 4 documents in this one file.
# Example stories
The following mix of "dark" stories were generated using the Vicuna prompt format with no system message and temperature=0: