Dusk-Miqu-70B / README.md
jukofyork's picture
Create README.md
a125e52 verified
|
raw
history blame
No virus
4.78 kB
metadata
base_model:
  - 152334H/miqu-1-70b-sf
  - sophosympatheia/Midnight-Rose-70B-v2.0.3
  - Sao10K/Euryale-1.3-L2-70B
  - Sao10K/WinterGoddess-1.4x-70B-L2
library_name: transformers
tags:
  - mergekit
  - merge
license: other

Dusk-Miqu.png

A "dark" creative writing model with 32k context. Based off miqu-1-70b but with greatly reduced "positivity" and "-isms". If you want happy endings, look elsewhere!

This model excels at writing Dark/Grimdark fantasy (see examples below).

Model background

This model is almost the same as Dark-Miqu-70B, but with @sophosympatheia's SLERP merge pattern:

parameters:
  t:
    - value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0]

which creates this truncated triangular distribution:

Dark-Miqu-Distribution.png

altered to use this truncated triangular distribution instead:

Dark-Miqu-Distribution-2.png

This keeps the first 16 and last 16 layers unaltered (which ties in with what people have found for the frankenmerge interleave patterns), and potentially fixes the "poor grammar" problem some people are having with Dark-Miqu-70B (sadly I can't replicate this though...).

Luckily this change also doesn't necessitate the recreation of the whole merge from scratch, and we can just use this:

merge_method: linear
parameters:
  weight: 1.0
slices:
  - sources:
      - model: 152334H/miqu-1-70b-sf
        layer_range: [0, 16]
      - model: jukofyork/dark-miqu-70b
        layer_range: [0, 16]
        parameters:
          weight: 0
  - sources:
      - model: jukofyork/dark-miqu-70b
        layer_range: [16, 64]
  - sources:
      - model: 152334H/miqu-1-70b-sf
        layer_range: [64, 80]
      - model: jukofyork/dark-miqu-70b
        layer_range: [64, 80]
        parameters:
          weight: 0
dtype: float16
tokenizer_source: model:miqu-1-70b-sf

Prompting format

Vicuna format is preferred:

USER: {prompt} ASSISTANT:

Mistral and Alpaca formats are also supported:

[INST] {prompt} [/INST]
### Instruction:
{prompt}

### Response:

Licence and usage restrictions

miqu-1-70b-sf is a dequantized version of the miqu-1-70b model leaked from MistralAI. All miqu-derived models, including this merge, are suitable for non-commercial, personal use only.

Mergekit configuration

The following YAML configuration was used to produce this model:

name: midnight-miqu-70b
models:
  - model: 152334H/miqu-1-70b-sf
  - model: sophosympatheia/Midnight-Rose-70B-v2.0.3
base_model: 152334H/miqu-1-70b-sf
merge_method: slerp
parameters:
  t:
    - value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0]
  embed_slerp: true
tokenizer_source: model:miqu-1-70b-sf
dtype: float16
---
name: euryale-miqu-70b
models:
  - model: 152334H/miqu-1-70b-sf
  - model: Sao10K/Euryale-1.3-L2-70B
base_model: 152334H/miqu-1-70b-sf
merge_method: slerp
parameters:
  t:
    - value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0]
  embed_slerp: true
tokenizer_source: model:miqu-1-70b-sf
dtype: float16
---
name: winter-miqu-70b
models:
  - model: 152334H/miqu-1-70b-sf
  - model: Sao10K/WinterGoddess-1.4x-70B-L2
base_model: 152334H/miqu-1-70b-sf
merge_method: slerp
parameters:
  t:
    - value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0]
  embed_slerp: true
tokenizer_source: model:miqu-1-70b-sf
dtype: float16
---
name: dark-miqu-70b
models:
  - model: 152334H/miqu-1-70b-sf
  - model: midnight-miqu-70b
  - model: euryale-miqu-70b
  - model: winter-miqu-70b
base_model: 152334H/miqu-1-70b-sf
merge_method: model_stock
dtype: float16

Key configuration details:

  • 'merge_method: slerp' uses spherical linear interpolation for merging models.
  • 'parameters: t' controls the interpolation ratios between models.
  • 'embed_slerp: true' applies slerp to the embedding layers.
  • 'merge_method: model_stock' uses the 'Model Stock' method.

See the Mergekit documentation for more on these settings.

NOTE: Run with mergekit-mega rather than mergekit as there are 4 documents in this one file.

Example stories

The following mix of "dark" stories were generated using the Vicuna prompt format with no system message and temperature=0: