ChuckMcSneed's picture
Update README.md
3697523 verified
|
raw
history blame
3.91 kB
metadata
library_name: transformers
tags:
  - mergekit
  - merge
license: llama2

logo.png

What is this

My experiment. Continuation of Benchmaxxxer series (meme models), but a bit more serious. Performs high on my benchmark and on huggingface benchmark, moderately-high in practice. Worth trying? Yeah. It is on the gooder side.

Observations

  • GPTslop: medium-low. Avoid at all costs or it won't stop generating it though.
  • Writing style: difficult to describe. Not the usual stuff. A bit of an autopilot like thing, if you write your usual lazy "ahh ahh mistress" it can give you a whole page of good text in return. High.
  • Censorship: if you can handle Xwin, you can handle this model. Medium-high?
  • Optimism: medium-low.
  • Violence: medium-low.
  • Intelligence: medium.
  • Creativity: medium-high.
  • Doesn't like high temperature. Keep below 1.5.

Prompt format

Vicuna or Alpaca.

Merge Details

This is a merge of pre-trained language models created using mergekit.

This model was merged using the linear merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: spicyboros
    parameters:
      weight: [0.093732305,0.403220342,0.055438423,0.043830778,0.054189303,0.081136828]
  - model: xwin
    parameters:
      weight: [0.398943486,0.042069007,0.161586088,0.470977297,0.389315704,0.416739102]
  - model: euryale
    parameters:
      weight: [0.061483013,0.079698633,0.043067724,0.00202751,0.132183868,0.36578003]
  - model: dolphin
    parameters:
      weight: [0.427942847,0.391488452,0.442164138,0,0,0.002174793]
  - model: wizard
    parameters:
      weight: [0.017898349,0.083523566,0.297743627,0.175345857,0.071770095,0.134169247]
  - model: WinterGoddess
    parameters:
      weight: [0,0,0,0.30781856,0.352541031,0]
merge_method: linear
dtype: float16
tokenizer_source: base

Benchmarks

NeoEvalPlusN_benchmark

My meme benchmark.

Name B C D S P total BCD SP
ChuckMcSneed/PMaxxxer-v1-70b 3 1 1 6.75 4.75 16.5 5 11.5
ChuckMcSneed/SMaxxxer-v1-70b 2 1 0 7.25 4.25 14.5 3 11.5
ChuckMcSneed/ArcaneEntanglement-model64-70b 3 2 1 7.25 6 19.25 6 13.25

Absurdly high. That's what happens when you optimize the merges for a benchmark.

Open LLM Leaderboard Evaluation Results

Leaderboard on Huggingface

Model Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
ChuckMcSneed/ArcaneEntanglement-model64-70b 72.79 71.42 87.96 70.83 60.53 83.03 63
ChuckMcSneed/PMaxxxer-v1-70b 72.41 71.08 87.88 70.39 59.77 82.64 62.7
ChuckMcSneed/SMaxxxer-v1-70b 72.23 70.65 88.02 70.55 60.7 82.87 60.58

This model is simply superior to my other meme models here.