Honey-Yuzu-13B / README.md

Update README.md

f29fbb4 verified 2 months ago

5.76 kB

	---
	base_model:
	- FallenMerick/Chunky-Lemon-Cookie-11B
	- Sao10K/Fimbulvetr-11B-v2.1-16K
	- senseable/WestLake-7B-v2
	base_model_relation: merge
	library_name: transformers
	tags:
	- mergekit
	- merge
	- roleplay
	- text-generation-inference
	license: cc-by-4.0
	---

	![cute](https://huggingface.co/matchaaaaa/Honey-Yuzu-13B/resolve/main/honey-yuzu-cute.png)

	Thank you [@Brooketh](https://huggingface.co/brooketh) for the [GGUFs](https://huggingface.co/backyardai/Honey-Yuzu-13B-GGUF)!!

	# Honey-Yuzu-13B

	Meet Honey-Yuzu, a sweet lemony tea brewed by yours truly! A bit of [Chunky-Lemon-Cookie-11B](https://huggingface.co/FallenMerick/Chunky-Lemon-Cookie-11B) here for its great flavor, with a dash of [WestLake-7B-v2](https://huggingface.co/senseable/WestLake-7B-v2) there to add some depth. I'm really proud of how it turned out, and I hope you like it too!

	It's not as verbose as Chaifighter, but it still writes very well. It boasts fantastic coherence and character understanding (in my opinion) for a 13B, and it's been my daily driver for a little bit. It's a solid RP model that should generally play nice with just about anything.

	Native Context Length: 8K/8192 (can be extended using RoPE, possibly past 16K)

	## Prompt Template: Alpaca

	```
	Below is an instruction that describes a task. Write a response that appropriately completes the request.

	### Instruction:
	{prompt}

	### Response:
	```

	## Recommended Settings: Universal-Light

	Here are some settings ranges that tend to work for me. They aren't strict values, and there's a bit of leeway in them. Feel free to experiment a bit!

	* Temperature: 1.0 to 1.25
	* Min-P: 0.05 to 0.1
	* Repetition Penalty: 1.05 to 1.1 (high values aren't needed and usually degrade output)
	* Rep. Penalty Range: 256 or 512
	* (all other samplers disabled)

	## The Deets

	This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

	### Merge Method

	This model was merged using the passthrough merge method.

	### Models Merged

	The following models were included in the merge:
	* [Chunky-Lemon-Cookie-11B](https://huggingface.co/FallenMerick/Chunky-Lemon-Cookie-11B)
	* [SanjiWatsuki/Kunoichi-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-7B)
	* [SanjiWatsuki/Silicon-Maid-7B](https://huggingface.co/SanjiWatsuki/Silicon-Maid-7B)
	* [KatyTheCutie/LemonadeRP-4.5.3](https://huggingface.co/KatyTheCutie/LemonadeRP-4.5.3)
	* [Fimbulvetr-11B-v2.1-16K](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2.1-16K)
	* [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
	* [senseable/WestLake-7B-v2](https://huggingface.co/senseable/WestLake-7B-v2)

	### The Special Sauce

	The following YAML configuration was used to produce this model:

	```yaml
	slices: # this is a quick float32 restack of BLC using the OG recipe
	- sources:
	- model: SanjiWatsuki/Kunoichi-7B
	layer_range: [0, 24]
	- sources:
	- model: SanjiWatsuki/Silicon-Maid-7B
	layer_range: [8, 24]
	- sources:
	- model: KatyTheCutie/LemonadeRP-4.5.3
	layer_range: [24, 32]
	merge_method: passthrough
	dtype: float32
	name: Big-Lemon-Cookie-11B
	---
	models: # this is a remake of CLC with the newer Fimbul v2.1 version
	- model: Big-Lemon-Cookie-11B
	parameters:
	weight: 0.85
	- model: Sao10K/Fimbulvetr-11B-v2.1-16K
	parameters:
	weight: 0.15
	merge_method: linear
	dtype: float32
	name: Chunky-Lemon-Cookie-11B
	---
	slices: # 8 layers of WL for the splice
	- sources:
	- model: senseable/WestLake-7B-v2
	layer_range: [8, 16]
	merge_method: passthrough
	dtype: float32
	name: WL-splice
	---
	slices: # 8 layers of CLC for the splice
	- sources:
	- model: Chunky-Lemon-Cookie-11B
	layer_range: [8, 16]
	merge_method: passthrough
	dtype: float32
	name: CLC-splice
	---
	models: # this is the splice, a gradient merge meant to gradually and smoothly interpolate between stacks of different models
	- model: WL-splice
	parameters:
	weight: [1, 1, 0.75, 0.625, 0.5, 0.375, 0.25, 0, 0] # 0.125 / 0.875 values removed here - "math gets screwy"
	- model: CLC-splice
	parameters:
	weight: [0, 0, 0.25, 0.375, 0.5, 0.625, 0.75, 1, 1] # 0.125 / 0.875 values removed here - "math gets screwy"
	merge_method: dare_linear # according to some paper, "DARE is all you need"
	base_model: WL-splice
	dtype: float32
	name: splice
	---
	slices: # putting it all together
	- sources:
	- model: senseable/WestLake-7B-v2
	layer_range: [0, 16]
	- sources:
	- model: splice
	layer_range: [0, 8]
	- sources:
	- model: Chunky-Lemon-Cookie-11B
	layer_range: [16, 48]
	merge_method: passthrough
	dtype: float32
	name: Honey-Yuzu-13B
	```

	### The Thought Process

	This was meant to be a simple RP-focused merge. I chose 2 well-performing RP models - [Chunky-Lemon-Cookie-11B](https://huggingface.co/FallenMerick/Chunky-Lemon-Cookie-11B) by [FallenMerick](https://huggingface.co/FallenMerick) and [WestLake-7B-v2](https://huggingface.co/senseable/WestLake-7B-v2) by [senseable](https://huggingface.co/senseable) - and merge them using a more conventional configuration (okay, okay, a 56 layer 12.5B Mistral isn't that conventional but still) rather than trying something wild or crazy and pushing the limits. I was very pleased with the results, but I wanted to see what would happen if I remade CLC with [Fimbulvetr-11B-v2.1-16K](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2.1-16K) by [Sao10K](https://huggingface.co/Sao10K). This resulted in equally nice (if not slightly better) outputs but greatly improved native context length.



	Have feedback? Comments? Questions? Don't hesitate to let me know! As always, have a wonderful day, and please be nice to yourself! :)