Edit model card

cute

V2 IS OUT!!!

GGUFs available here! Thanks to @brooketh for providing the quantized models!

Chaifighter 20B

Meet Chaifighter 20B. This is my shot at making Fimbulvetr 11B v2 a bit more creative and verbose while retaining its incredible coherence and intelligence. It also shows that SOLAR-based models and Mistral-based models can be merged, as SOLAR 10.7B was based on a Mistral 7B frankenmerge and finetuned a bit.

I also wanted to provide an alternative to Psyonic Cetacean 20B, which is a fantastic model that you should check out if you haven't already! The issue with that model is that it's based on Llama 2, which is outdated now. The older architecture lacked many performance enhancements that were introduced by the Mistral architecture, and on my 16 GB RTX 4060 Ti, those performance enhancements were the difference between decently speedy and intolerably sluggish. I wanted to help cater towards those who can run a more than a 13B but not a 34B, so this is a good middle ground.

Chaifighter 20B is geared towards long-form roleplay chats rather than short-form IRC/Discord RP chats. It loves verbosity and detail, and its quality will depend on how much "ammunition" you can give it. While it sorta-kinda can do short-form with some swiping, it isn't really ideal. But for those essay-writing powerhouses that love typing up a storm in the character card, this one's for you.

Chaifighter 20B natively supports a context window of only 4096 tokens maximum. I tried RoPE scaling but it was not happy from the limited testing I did. Your mileage may vary, and if anyone can manage to get it working higher, I'd love to hear about it!

Stay tuned for V2! Feedback is welcomed and appreciated!!

Recommended Parameters:

  • Temperature: 1.0 - 1.25
  • Min-P: 0.1
  • Repetition Penalty: 1.05-1.1
  • All other samplers disabled

Or, alternatively, use Universal Light in SillyTavern!

Prompt Template: Alpaca

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:

Mergekit

Chaifighter 20B is a frankenmerge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the passthrough merge method.

Models Merged

The following models were included in the merge:

The Sauceeeeee

slices:
  - sources:
    - model: Sao10K/Fimbulvetr-11B-v2
      layer_range: [0, 40] # all but last 8 layers 
  - sources:
    - model: SanjiWatsuki/Kunoichi-7B
      layer_range: [0, 24] # all but last 8 layers
  - sources:
    - model: Undi95/Toppy-M-7B
      layer_range: [16, 24] # 16 layers of Toppy and MythoMist split and interleaved to (in theory) boost the model's coherence
  - sources:
    - model: Gryphe/MythoMist-7b
      layer_range: [16, 24]
  - sources:
    - model: Undi95/Toppy-M-7B
      layer_range: [25, 32]
  - sources:
    - model: Gryphe/MythoMist-7b
      layer_range: [25, 32]
merge_method: passthrough
dtype: bfloat16

Yeah, it's mad sussy. I know what I did, but I'm not sorry.

Other stuff

Okay! Fine! It's not really a 20B, it's a 21B, but I did everything planning for a 20B before deciding to add 4 more layers to the model to make it more stable. It made a big difference.

Yapping time. As far as the name is concerned, I'm going for a tea/coffee/hot drink motif for my models, and one of the names I was debating on using for this model was Chai-Latte. As I worked on this merge, I got the idea of naming it "Chaifighter" as a play on "Psyfighter2", one of the models making up Psyonic Cetacean and also a play on a model called "Tiefighter" from which it was derived. Both are fantastic models, especially given their age. They're both worth checking out too if you haven't done so. "Chai" itself is a play on a certain AI chatting website (CAI) that got me into this lovely mess in the first place. So I guess it's fitting to name the first model of the series after it.

And lastly, of course, thank you for checking out my model! Have a great day and please take care of yourself, alright? :)

Downloads last month
568
Safetensors
Model size
20.8B params
Tensor type
BF16
·

Finetuned from