Dusk-Miqu-70B / README.md

jukofyork

Create README.md

a125e52 verified 3 months ago

preview code

raw

history blame

No virus

4.78 kB

	---
	base_model:
	- 152334H/miqu-1-70b-sf
	- sophosympatheia/Midnight-Rose-70B-v2.0.3
	- Sao10K/Euryale-1.3-L2-70B
	- Sao10K/WinterGoddess-1.4x-70B-L2
	library_name: transformers
	tags:
	- mergekit
	- merge
	license: other
	---

	![Dusk-Miqu.png](Dusk-Miqu.png)

	A "dark" creative writing model with 32k context. Based off [miqu-1-70b](https://huggingface.co/miqudev/miqu-1-70b) but with greatly reduced "positivity" and "-isms". If you want happy endings, look elsewhere!

	This model excels at writing Dark/Grimdark fantasy (see examples below).

	# Model background

	This model is almost the same as [Dark-Miqu-70B](https://huggingface.co/jukofyork/Dark-Miqu-70B), but with @sophosympatheia's SLERP merge pattern:

	```yaml
	parameters:
	t:
	- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0]
	```

	which creates this truncated triangular distribution:

	![Dark-Miqu-Distribution.png](https://cdn-uploads.huggingface.co/production/uploads/65995c45539c808e84c38bf1/guNF-5tEcKxeGVxyJcfCO.png)

	altered to use this truncated triangular distribution instead:

	![Dark-Miqu-Distribution-2.png](https://cdn-uploads.huggingface.co/production/uploads/65995c45539c808e84c38bf1/62P3bxJkTgw_-6gqpqG2g.png)

	This keeps the first 16 and last 16 layers unaltered (which ties in with what people have found for the frankenmerge interleave patterns), and potentially fixes the ["poor grammar"](https://huggingface.co/jukofyork/Dark-Miqu-70B/discussions/2) problem some people are having with [Dark-Miqu-70B](https://huggingface.co/jukofyork/Dark-Miqu-70B) (sadly I can't replicate this though...).

	Luckily this change also doesn't necessitate the recreation of the whole merge from scratch, and we can just use this:

	```yaml
	merge_method: linear
	parameters:
	weight: 1.0
	slices:
	- sources:
	- model: 152334H/miqu-1-70b-sf
	layer_range: [0, 16]
	- model: jukofyork/dark-miqu-70b
	layer_range: [0, 16]
	parameters:
	weight: 0
	- sources:
	- model: jukofyork/dark-miqu-70b
	layer_range: [16, 64]
	- sources:
	- model: 152334H/miqu-1-70b-sf
	layer_range: [64, 80]
	- model: jukofyork/dark-miqu-70b
	layer_range: [64, 80]
	parameters:
	weight: 0
	dtype: float16
	tokenizer_source: model:miqu-1-70b-sf
	```

	# Prompting format

	Vicuna format is preferred:

	```
	USER: {prompt} ASSISTANT:
	```

	Mistral and Alpaca formats are also supported:

	```
	[INST] {prompt} [/INST]
	```

	```
	### Instruction:
	{prompt}

	### Response:
	```

	# Licence and usage restrictions

	[miqu-1-70b-sf](https://huggingface.co/152334H/miqu-1-70b-sf) is a dequantized version of the [miqu-1-70b](https://huggingface.co/miqudev/miqu-1-70b) model leaked from MistralAI. All miqu-derived models, including this merge, are suitable for non-commercial, personal use only.

	# Mergekit configuration

	The following YAML configuration was used to produce this model:

	```yaml
	name: midnight-miqu-70b
	models:
	- model: 152334H/miqu-1-70b-sf
	- model: sophosympatheia/Midnight-Rose-70B-v2.0.3
	base_model: 152334H/miqu-1-70b-sf
	merge_method: slerp
	parameters:
	t:
	- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0]
	embed_slerp: true
	tokenizer_source: model:miqu-1-70b-sf
	dtype: float16
	---
	name: euryale-miqu-70b
	models:
	- model: 152334H/miqu-1-70b-sf
	- model: Sao10K/Euryale-1.3-L2-70B
	base_model: 152334H/miqu-1-70b-sf
	merge_method: slerp
	parameters:
	t:
	- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0]
	embed_slerp: true
	tokenizer_source: model:miqu-1-70b-sf
	dtype: float16
	---
	name: winter-miqu-70b
	models:
	- model: 152334H/miqu-1-70b-sf
	- model: Sao10K/WinterGoddess-1.4x-70B-L2
	base_model: 152334H/miqu-1-70b-sf
	merge_method: slerp
	parameters:
	t:
	- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0]
	embed_slerp: true
	tokenizer_source: model:miqu-1-70b-sf
	dtype: float16
	---
	name: dark-miqu-70b
	models:
	- model: 152334H/miqu-1-70b-sf
	- model: midnight-miqu-70b
	- model: euryale-miqu-70b
	- model: winter-miqu-70b
	base_model: 152334H/miqu-1-70b-sf
	merge_method: model_stock
	dtype: float16
	```

	## Key configuration details:

	- '`merge_method: slerp`' uses spherical linear interpolation for merging models.
	- '`parameters: t`' controls the interpolation ratios between models.
	- '`embed_slerp: true`' applies slerp to the embedding layers.
	- '`merge_method: model_stock`' uses the '[Model Stock](https://arxiv.org/abs/2403.19522)' method.

	See the [Mergekit documentation](https://github.com/arcee-ai/mergekit) for more on these settings.

	NOTE: Run with `mergekit-mega` rather than `mergekit` as there are 4 documents in this one file.

	# Example stories

	The following mix of "dark" stories were generated using the Vicuna prompt format with no system message and temperature=0: