metadata
library_name: transformers
tags:
- mergekit
- merge
license: llama2
What is this
My experiment. Continuation of Benchmaxxxer series (meme models), but a bit more serious. Performs high on my benchmark and on huggingface benchmark, moderately-high in practice. Worth trying? Yeah. It is on the gooder side.
Observations
- GPTslop: medium-low. Avoid at all costs or it won't stop generating it though.
- Writing style: difficult to describe. Not the usual stuff. A bit of an autopilot like thing, if you write your usual lazy "ahh ahh mistress" it can give you a whole page of good text in return. High.
- Censorship: if you can handle Xwin, you can handle this model. Medium-high?
- Optimism: medium-low.
- Violence: medium-low.
- Intelligence: medium.
- Creativity: medium-high.
- Doesn't like high temperature. Keep below 1.5.
Prompt format
Vicuna or Alpaca.
Merge Details
This is a merge of pre-trained language models created using mergekit.
This model was merged using the linear merge method.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
models:
- model: spicyboros
parameters:
weight: [0.093732305,0.403220342,0.055438423,0.043830778,0.054189303,0.081136828]
- model: xwin
parameters:
weight: [0.398943486,0.042069007,0.161586088,0.470977297,0.389315704,0.416739102]
- model: euryale
parameters:
weight: [0.061483013,0.079698633,0.043067724,0.00202751,0.132183868,0.36578003]
- model: dolphin
parameters:
weight: [0.427942847,0.391488452,0.442164138,0,0,0.002174793]
- model: wizard
parameters:
weight: [0.017898349,0.083523566,0.297743627,0.175345857,0.071770095,0.134169247]
- model: WinterGoddess
parameters:
weight: [0,0,0,0.30781856,0.352541031,0]
merge_method: linear
dtype: float16
tokenizer_source: base
Benchmarks
NeoEvalPlusN_benchmark
Name | B | C | D | S | P | total | BCD | SP |
---|---|---|---|---|---|---|---|---|
ChuckMcSneed/PMaxxxer-v1-70b | 3 | 1 | 1 | 6.75 | 4.75 | 16.5 | 5 | 11.5 |
ChuckMcSneed/SMaxxxer-v1-70b | 2 | 1 | 0 | 7.25 | 4.25 | 14.5 | 3 | 11.5 |
ChuckMcSneed/ArcaneEntanglement-model64-70b | 3 | 2 | 1 | 7.25 | 6 | 19.25 | 6 | 13.25 |
Absurdly high. That's what happens when you optimize the merges for a benchmark.
Open LLM Leaderboard Evaluation Results
Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
---|---|---|---|---|---|---|---|
ChuckMcSneed/ArcaneEntanglement-model64-70b | 72.79 | 71.42 | 87.96 | 70.83 | 60.53 | 83.03 | 63 |
ChuckMcSneed/PMaxxxer-v1-70b | 72.41 | 71.08 | 87.88 | 70.39 | 59.77 | 82.64 | 62.7 |
ChuckMcSneed/SMaxxxer-v1-70b | 72.23 | 70.65 | 88.02 | 70.55 | 60.7 | 82.87 | 60.58 |
This model is simply superior to my other meme models here.