Honey-Yuzu-13B / README.md
matchaaaaa's picture
Update README.md
f29fbb4 verified
metadata
base_model:
  - FallenMerick/Chunky-Lemon-Cookie-11B
  - Sao10K/Fimbulvetr-11B-v2.1-16K
  - senseable/WestLake-7B-v2
base_model_relation: merge
library_name: transformers
tags:
  - mergekit
  - merge
  - roleplay
  - text-generation-inference
license: cc-by-4.0

cute

Thank you @Brooketh for the GGUFs!!

Honey-Yuzu-13B

Meet Honey-Yuzu, a sweet lemony tea brewed by yours truly! A bit of Chunky-Lemon-Cookie-11B here for its great flavor, with a dash of WestLake-7B-v2 there to add some depth. I'm really proud of how it turned out, and I hope you like it too!

It's not as verbose as Chaifighter, but it still writes very well. It boasts fantastic coherence and character understanding (in my opinion) for a 13B, and it's been my daily driver for a little bit. It's a solid RP model that should generally play nice with just about anything.

Native Context Length: 8K/8192 (can be extended using RoPE, possibly past 16K)

Prompt Template: Alpaca

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:

Recommended Settings: Universal-Light

Here are some settings ranges that tend to work for me. They aren't strict values, and there's a bit of leeway in them. Feel free to experiment a bit!

  • Temperature: 1.0 to 1.25
  • Min-P: 0.05 to 0.1
  • Repetition Penalty: 1.05 to 1.1 (high values aren't needed and usually degrade output)
  • Rep. Penalty Range: 256 or 512
  • (all other samplers disabled)

The Deets

This is a merge of pre-trained language models created using mergekit.

Merge Method

This model was merged using the passthrough merge method.

Models Merged

The following models were included in the merge:

The Special Sauce

The following YAML configuration was used to produce this model:

slices: # this is a quick float32 restack of BLC using the OG recipe
  - sources:
    - model: SanjiWatsuki/Kunoichi-7B
      layer_range: [0, 24]
  - sources:
    - model: SanjiWatsuki/Silicon-Maid-7B
      layer_range: [8, 24]
  - sources:
    - model: KatyTheCutie/LemonadeRP-4.5.3
      layer_range: [24, 32]
merge_method: passthrough
dtype: float32
name: Big-Lemon-Cookie-11B
---
models: # this is a remake of CLC with the newer Fimbul v2.1 version
  - model: Big-Lemon-Cookie-11B
    parameters:
      weight: 0.85
  - model: Sao10K/Fimbulvetr-11B-v2.1-16K
    parameters:
      weight: 0.15
merge_method: linear
dtype: float32
name: Chunky-Lemon-Cookie-11B
---
slices: # 8 layers of WL for the splice
  - sources:
    - model: senseable/WestLake-7B-v2
      layer_range: [8, 16]
merge_method: passthrough
dtype: float32
name: WL-splice
---
slices: # 8 layers of CLC for the splice
  - sources:
    - model: Chunky-Lemon-Cookie-11B
      layer_range: [8, 16]
merge_method: passthrough
dtype: float32
name: CLC-splice
---
models: # this is the splice, a gradient merge meant to gradually and smoothly interpolate between stacks of different models
  - model: WL-splice
    parameters:
      weight: [1, 1, 0.75, 0.625, 0.5, 0.375, 0.25, 0, 0] # 0.125 / 0.875 values removed here - "math gets screwy" 
  - model: CLC-splice
    parameters:
      weight: [0, 0, 0.25, 0.375, 0.5, 0.625, 0.75, 1, 1] # 0.125 / 0.875 values removed here - "math gets screwy" 
merge_method: dare_linear # according to some paper, "DARE is all you need"
base_model: WL-splice
dtype: float32
name: splice
---
slices: # putting it all together
  - sources:
    - model: senseable/WestLake-7B-v2
      layer_range: [0, 16]
  - sources: 
    - model: splice
      layer_range: [0, 8]
  - sources:
    - model: Chunky-Lemon-Cookie-11B
      layer_range: [16, 48]
merge_method: passthrough
dtype: float32
name: Honey-Yuzu-13B

The Thought Process

This was meant to be a simple RP-focused merge. I chose 2 well-performing RP models - Chunky-Lemon-Cookie-11B by FallenMerick and WestLake-7B-v2 by senseable - and merge them using a more conventional configuration (okay, okay, a 56 layer 12.5B Mistral isn't that conventional but still) rather than trying something wild or crazy and pushing the limits. I was very pleased with the results, but I wanted to see what would happen if I remade CLC with Fimbulvetr-11B-v2.1-16K by Sao10K. This resulted in equally nice (if not slightly better) outputs but greatly improved native context length.

Have feedback? Comments? Questions? Don't hesitate to let me know! As always, have a wonderful day, and please be nice to yourself! :)