base_model:
- FallenMerick/Chunky-Lemon-Cookie-11B
- Sao10K/Fimbulvetr-11B-v2.1-16K
- senseable/WestLake-7B-v2
base_model_relation: merge
library_name: transformers
tags:
- mergekit
- merge
- roleplay
- text-generation-inference
license: cc-by-4.0
Thank you @Brooketh for the GGUFs!!
Honey-Yuzu-13B
Meet Honey-Yuzu, a sweet lemony tea brewed by yours truly! A bit of Chunky-Lemon-Cookie-11B here for its great flavor, with a dash of WestLake-7B-v2 there to add some depth. I'm really proud of how it turned out, and I hope you like it too!
It's not as verbose as Chaifighter, but it still writes very well. It boasts fantastic coherence and character understanding (in my opinion) for a 13B, and it's been my daily driver for a little bit. It's a solid RP model that should generally play nice with just about anything.
Native Context Length: 8K/8192 (can be extended using RoPE, possibly past 16K)
Prompt Template: Alpaca
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
Recommended Settings: Universal-Light
Here are some settings ranges that tend to work for me. They aren't strict values, and there's a bit of leeway in them. Feel free to experiment a bit!
- Temperature: 1.0 to 1.25
- Min-P: 0.05 to 0.1
- Repetition Penalty: 1.05 to 1.1 (high values aren't needed and usually degrade output)
- Rep. Penalty Range: 256 or 512
- (all other samplers disabled)
The Deets
This is a merge of pre-trained language models created using mergekit.
Merge Method
This model was merged using the passthrough merge method.
Models Merged
The following models were included in the merge:
The Special Sauce
The following YAML configuration was used to produce this model:
slices: # this is a quick float32 restack of BLC using the OG recipe
- sources:
- model: SanjiWatsuki/Kunoichi-7B
layer_range: [0, 24]
- sources:
- model: SanjiWatsuki/Silicon-Maid-7B
layer_range: [8, 24]
- sources:
- model: KatyTheCutie/LemonadeRP-4.5.3
layer_range: [24, 32]
merge_method: passthrough
dtype: float32
name: Big-Lemon-Cookie-11B
---
models: # this is a remake of CLC with the newer Fimbul v2.1 version
- model: Big-Lemon-Cookie-11B
parameters:
weight: 0.85
- model: Sao10K/Fimbulvetr-11B-v2.1-16K
parameters:
weight: 0.15
merge_method: linear
dtype: float32
name: Chunky-Lemon-Cookie-11B
---
slices: # 8 layers of WL for the splice
- sources:
- model: senseable/WestLake-7B-v2
layer_range: [8, 16]
merge_method: passthrough
dtype: float32
name: WL-splice
---
slices: # 8 layers of CLC for the splice
- sources:
- model: Chunky-Lemon-Cookie-11B
layer_range: [8, 16]
merge_method: passthrough
dtype: float32
name: CLC-splice
---
models: # this is the splice, a gradient merge meant to gradually and smoothly interpolate between stacks of different models
- model: WL-splice
parameters:
weight: [1, 1, 0.75, 0.625, 0.5, 0.375, 0.25, 0, 0] # 0.125 / 0.875 values removed here - "math gets screwy"
- model: CLC-splice
parameters:
weight: [0, 0, 0.25, 0.375, 0.5, 0.625, 0.75, 1, 1] # 0.125 / 0.875 values removed here - "math gets screwy"
merge_method: dare_linear # according to some paper, "DARE is all you need"
base_model: WL-splice
dtype: float32
name: splice
---
slices: # putting it all together
- sources:
- model: senseable/WestLake-7B-v2
layer_range: [0, 16]
- sources:
- model: splice
layer_range: [0, 8]
- sources:
- model: Chunky-Lemon-Cookie-11B
layer_range: [16, 48]
merge_method: passthrough
dtype: float32
name: Honey-Yuzu-13B
The Thought Process
This was meant to be a simple RP-focused merge. I chose 2 well-performing RP models - Chunky-Lemon-Cookie-11B by FallenMerick and WestLake-7B-v2 by senseable - and merge them using a more conventional configuration (okay, okay, a 56 layer 12.5B Mistral isn't that conventional but still) rather than trying something wild or crazy and pushing the limits. I was very pleased with the results, but I wanted to see what would happen if I remade CLC with Fimbulvetr-11B-v2.1-16K by Sao10K. This resulted in equally nice (if not slightly better) outputs but greatly improved native context length.
Have feedback? Comments? Questions? Don't hesitate to let me know! As always, have a wonderful day, and please be nice to yourself! :)