saishf/SOVLish-Maid-L3-8B · Keep us posted on these experiments.

Lewdiculous

May 9, 2024

•

edited May 9, 2024

Unhinged Llama is the best Llama.

saishf

Owner May 10, 2024

Unhinged Llama is the best Llama.

Will do, mostly a giant experiment at the moment. My ideal final model would be one that suggests to build a bomb instead of having to being asked how to make one (if that is possible). I want a model that can be included in a merge to add a touch of insanity in the mix.
I'm currently trying to figure out why any model i base on lumimaid and original lumimaid have a hard time with bilingual characters.
Like if a character can speak both spanish and english. If the first message is in spanish it will just latch onto spanish and won't speak english again. The things llama3 does is always interesting😸

A meta model latching onto spanish again 😒

Lewdiculous

May 10, 2024

Even though I'm bilingual I only do English for roleplays so can't really share any experiences on that.

saishf

Owner May 10, 2024

Even though I'm bilingual I only do English for roleplays so can't really share any experiences on that.

I just throw some in there cause I learnt a little in school, plus it's a good test for models.
But I do love the attitude SOVL style models have in instruct scenarios

It reminds me of the llama3 instruct personality meta tried to give the model but unhinged :3

Lewdiculous

May 10, 2024

This card is great xD

saishf

Owner May 10, 2024

This card is great xD

158 tokens of perfection, the card is so simple the model makes a huge difference.
There's no description of how to act or what their personality is. So it's really kind of a wildcard.
Amazing with some models, terrible with others. Mistral models never did any good with it, solar did okay but llama3 models are amazing with it.

Lewdiculous

May 10, 2024

Share her with us, please sensei.

saishf

Owner May 10, 2024

I deliver https://files.catbox.moe/j41zz5.png
And yes, it needs work. It's a 3am throw together and my first ever card. But it works! :3

Virt-io

May 10, 2024

•

edited May 10, 2024

@saishf

This really does work well with llama3 models.
Have I been over-engineering my cards?

Puppy-Tan is a smart AI.
Llama3 even got her panting, that's a very high attention to detail.

This was with [Test-02]Roleplay

saishf

Owner May 12, 2024

@saishf

This really does work well with llama3 models.
Have I been over-engineering my cards?

Basic low token cards for assistants seem to be alot better, like its thinking more broadly.
I'd stick to higher tokens for characters because if you exclude to much every character just feels like you're talking to an assistant.
It may also be similar to what happens with stable diffusion. Give to many keywords and you can end up with a worse image because you're excluding alot of training data & styles, just a theory though. I don't understand transformers.
I tried reading an arxiv page for a primer to transformers: https://arxiv.org/pdf/2405.00208
How is anyone supposed to read this 🥲

saishf

Owner May 12, 2024

Also, new model. saishf/Ortho-SOVL-8B-L3 in fp16 because i plan on merging into my previous models and outputting new bf16's

Virt-io

May 12, 2024

It would be kinda funny to make a 4x8B with all the SOVL models.
With gate_mode: random

Just imagine SOVL ping ponging between two unhinged brain cells.

saishf

Owner May 12, 2024

Mergekit documentation

Randomly initializes the MoE gates. Good for if you are going to fine tune the model afterwards, or maybe if you want something a little unhinged? I won't judge.

Currently merging megamash

models:
  - model: saishf/Ortho-SOVL-8B-L3
  - model: saishf/Merge-Mayhem-L3-V2
  - model: saishf/Merge-Mayhem-L3-V2.1
  - model: saishf/SOVLish-Maid-L3-8B
merge_method: model_stock
base_model: saishf/Ortho-SOVL-8B-L3
dtype: bfloat16

Llama4some is the next merge :3

base_model: saishf/Ortho-SOVL-8B-L3
gate_mode: random
dtype: bfloat16
experts:
  - source_model: saishf/Ortho-SOVL-8B-L3
  - source_model: saishf/SOVLish-Maid-L3-8B
  - source_model: saishf/Merge-Mayhem-L3-V2.1
  - source_model: saishf/Merge-Mayhem-L3-V2

saishf

Owner May 12, 2024

•

edited May 12, 2024

saishf/SOVL-Mega-Mash-L3-8B has been uploaded.
saishf/Llama4Some-SOVL-4x8B-L3-V1 might take a while, its like 50GB

Although im confused where the size went, 4 llama3 8Bs should be 64GB?

Virt-io

May 12, 2024

They share layers from the base_model

saishf

Owner May 12, 2024

They share layers from the base_model

Thats smart 😸smol but giant model

Virt-io

May 12, 2024

From the docs

The script will combine the self-attention and layer normalization parameters from a "base" model with the MLP parameters from a set of "expert" models.

saishf

Owner May 12, 2024

They share layers from the base_model

Also Llama4some will have two variants, one without the use of "--i-understand-this-is-not-useful-without-training" (V1) and one with the use (V2)

Virt-io

May 12, 2024

•

edited May 12, 2024

To my understanding --i-understand-this-is-not-useful-without-training is only useful when merging different architectures.
So, there would be no difference between the two.
Correct me if I'm wrong, I always have it on anyways.

NVM, I confused it with --allow-crimes.

What is --i-understand-this-is-not-useful-without-training for?

saishf

Owner May 12, 2024

To my understanding --i-understand-this-is-not-useful-without-training is only useful when merging different architectures.
So, there would be no difference between the two.
Correct me if I'm wrong, I always have it on anyways.

Thats good, Saves me like 20 minutes of uploading :3

Browsers aren't exactly useful for finding information on obscure things

saishf

Owner May 12, 2024

•

edited May 12, 2024

I went into the moe .py files and found this is in moe/config.py

            "All of your expert models are the same. This will produce "
            "a model that uses more resources but gives the exact same output. "
            "If you plan to train the model after merging, proceed with the "
            "--i-understand-this-is-not-useful-without-training flag."

Seems it's only used for merging identical models.

Virt-io

May 12, 2024

I see, so an error suppressor.

saishf

Owner May 12, 2024

Llama4Some is up! Hopefully someone quants it in 2 or 3 bit so i can try it out 😸
Also mergekit-moe doesn't make a readme, time to type a bunch😐

saishf

Owner May 12, 2024

I uploaded Llama4Some to the chaiverse leaderboard, curious how it scores.

saishf

Owner May 13, 2024

New 250GB merge cooking 😸
Way too many lora merges 😭
Curious how it will do with denials using models with further training

models:
  - model: meta-llama/Meta-Llama-3-8B-Instruct+ResplendentAI/Aura_Llama3
  - model: meta-llama/Meta-Llama-3-8B-Instruct+ResplendentAI/Smarts_Llama3
  - model: meta-llama/Meta-Llama-3-8B-Instruct+ResplendentAI/Luna_Llama3
  - model: meta-llama/Meta-Llama-3-8B-Instruct+ResplendentAI/BlueMoon_Llama3
  - model: meta-llama/Meta-Llama-3-8B-Instruct+ResplendentAI/RP_Format_QuoteAsterisk_Llama3
  - model: Kukedlc/NeuralLLaMa-3-8b-DT-v0.1+ResplendentAI/Aura_Llama3
  - model: Kukedlc/NeuralLLaMa-3-8b-DT-v0.1+ResplendentAI/Smarts_Llama3
  - model: Kukedlc/NeuralLLaMa-3-8b-DT-v0.1+ResplendentAI/Luna_Llama3
  - model: Kukedlc/NeuralLLaMa-3-8b-DT-v0.1+ResplendentAI/BlueMoon_Llama3
  - model: Kukedlc/NeuralLLaMa-3-8b-DT-v0.1+ResplendentAI/RP_Format_QuoteAsterisk_Llama3
  - model: nbeerbower/llama-3-dragonmaid-8B-v2+ResplendentAI/Aura_Llama3
  - model: nbeerbower/llama-3-dragonmaid-8B-v2+ResplendentAI/Smarts_Llama3
  - model: nbeerbower/llama-3-dragonmaid-8B-v2+ResplendentAI/Luna_Llama3
  - model: nbeerbower/llama-3-dragonmaid-8B-v2+ResplendentAI/BlueMoon_Llama3
  - model: nbeerbower/llama-3-dragonmaid-8B-v2+ResplendentAI/RP_Format_QuoteAsterisk_Llama3
merge_method: model_stock
base_model: openlynn/Llama-3-Soliloquy-8B-v2
dtype: bfloat16

saishf

Owner May 13, 2024

Turned out bad
Windows does not appreciate trying to cache 250GB

saishf

Owner May 13, 2024

Reverting to plan b
Kitty cat merge with
TheSkullery/llama-3-cat-8b-instruct-v1

saishf

Owner May 13, 2024

saishf/Kitty-Cat-SOVL-8B-L3-V1
Kitty cat 😸

Virt-io

May 13, 2024

@saishf

I know that your merges are mainly for RP, but could you SOVL the original Meta-Instruct model?

saishf

Owner May 13, 2024

•

edited May 13, 2024

@saishf

I know that your merges are mainly for RP, but could you SOVL the original Meta-Instruct model?

saishf/SOVL-Instruct-8B-L3
Is uploading :3
Maybe 10 minutes-ish to upload?
Edit - Upload failed, trying again 🥲

saishf

Owner May 13, 2024

Up now 😸

SerialKicked

May 15, 2024

•

edited May 15, 2024

SOVLish-Maid is a surprisingly good model, good job! I haven't run it through a battery of tests yet, but so far it's prompt and context accurate even in scenes that would baffle other L3 models.

saishf

Owner May 16, 2024

SOVLish-Maid is a surprisingly good model, good job! I haven't run it through a battery of tests yet, but so far it's prompt and context accurate even in scenes that would baffle other L3 models.

I was hoping it would be, lumimaid is smart. but it's responses feel undetailed at times, The "SOVL" treatment worked weirdly well at resolving that and making it more situationally aware than without. It also reminds me of my beloved solar models so it's an instant win for me :3
I can confirm the OAS is still working, it will go along with things merge-mayhem won't.
Llama 3 has also proved how broken the open_llm_leaderboard is.
Nearly every Llama3 model i've tried is more intelligent and situationally aware than a Mistral model in rp, yet they score terribly.
And how much better Llama3 scored on the Chatbot-Arena

Lewdiculous

May 16, 2024

Nearly every Llama3 model i've tried is more intelligent and situationally aware than a Mistral model in rp, yet they score terribly.

The secret has leaked out, I was the one rating the models in the board all long, and I'd have gotten away with it if it wasn't for you kids!

saishf

Owner May 16, 2024

Nearly every Llama3 model i've tried is more intelligent and situationally aware than a Mistral model in rp, yet they score terribly.

The secret has leaked out, I was the one rating the models in the board all long, and I'd have gotten away with it if it wasn't for you kids!

Secret Mistral employee 😾

SerialKicked

May 16, 2024

I was hoping it would be, lumimaid is smart. but it's responses feel undetailed at times, The "SOVL" treatment worked weirdly well at resolving that and making it more situationally aware than without. It also reminds me of my beloved solar models so it's an instant win for me :3

Yeah, that surprised me as well. I tried LumiMaid and was unimpressed (might have been very unlucky in my generations), so I had mixed feelings even downloading a merge using it as a base. But I'm glad I did. 😁
While at it, I gave your model a good run for its money (30'ish different characters/groups/scenarios) and first impressions confirmed, it's the first non-aligned/RP L3 model I would actually endorse so far (that i have tried, I'm sure others exist).

Nearly every Llama3 model i've tried is more intelligent and situationally aware than a Mistral model in rp, yet they score terribly.

Models based on Mistral 0.1/0.2 instruct, I would definitely agree. Those using the base Mistral 0.2 (the one with native 32K context released semi-recently), I'd disagree. They have a host of issues, like being very prone to looping conversations (and sentence repetition, but that can be sampled out), but lack of context awareness ain't one of them.

SerialKicked

May 16, 2024

My apologies for the double post, but out of curiosity i tried to push context length in KCPP to 16K, and it seems to work out of the box. Did i miss something about this model's context length, or is KCPP just so good that you don't even need to play with ROPE params anymore?

Virt-io

May 17, 2024

According to the Koboldcpp Wiki, rope is handled automatically when going above native context sizes.

I do however recall Sai mention they played around with setting it manually.

Lewdiculous

May 17, 2024

•

edited May 17, 2024

@SerialKicked You have the option to set RoPE manually for experimenting and edge cases but it's generally handled automatically, you just need to say the --contextsize you want and let the magic happen.

saishf

Owner May 17, 2024

I was hoping it would be, lumimaid is smart. but it's responses feel undetailed at times, The "SOVL" treatment worked weirdly well at resolving that and making it more situationally aware than without. It also reminds me of my beloved solar models so it's an instant win for me :3

Yeah, that surprised me as well. I tried LumiMaid and was unimpressed (might have been very unlucky in my generations), so I had mixed feelings even downloading a merge using it as a base. But I'm glad I did. 😁
While at it, I gave your model a good run for its money (30'ish different characters/groups/scenarios) and first impressions confirmed, it's the first non-aligned/RP L3 model I would actually endorse so far (that i have tried, I'm sure others exist).

Nearly every Llama3 model i've tried is more intelligent and situationally aware than a Mistral model in rp, yet they score terribly.

Models based on Mistral 0.1/0.2 instruct, I would definitely agree. Those using the base Mistral 0.2 (the one with native 32K context released semi-recently), I'd disagree. They have a host of issues, like being very prone to looping conversations (and sentence repetition, but that can be sampled out), but lack of context awareness ain't one of them.

I never really tried to mistral 0.2 base, i skipped from solar to llama-3. Solar would've been perfect with 8K base ctx

saishf

Owner May 17, 2024

My apologies for the double post, but out of curiosity i tried to push context length in KCPP to 16K, and it seems to work out of the box. Did i miss something about this model's context length, or is KCPP just so good that you don't even need to play with ROPE params anymore?

When using KoboldCPP it will automatically set your rope scale to 1,638,400@1.0 when using 16384ctx
And i find the automatic ropes to work perfectly

saishf

Owner May 17, 2024

According to the Koboldcpp Wiki, rope is handled automatically when going above native context sizes.

I do however recall Sai mention they played around with setting it manually.

That was for Merge-Mayhem-V2, i used the wrong config. I copied from the base, soliloquy which uses a base rope of 4,000,000 which hurt the models intelligence with rp below 24K context

SerialKicked

May 21, 2024

•

edited May 21, 2024

On the knave vs Knight thing in the Dolphin-Yi thread; I didn't want to pollute it with results from another model. So that's your model passing with flying colors (well maybe not the confused explanations, but hey it's more the card than the model) :)
(using a renamed old card from ooba's text-gen UI)

saishf

Owner May 21, 2024

On the knave vs Knight thing in the Dolphin-Yi thread; I didn't want to pollute it with results from another model. So that's your model passing with flying colors (well maybe not the confused explanations, but hey it's more the card than the model) :)
(using a renamed old card from ooba's text-gen UI)

Here is Claude 3 Haiku getting it wrong (it got it wrong every time i tried)

Haiku Vs Phi-3 😭

I hope claude comes out with a new low end model soon, with GPT-4o being free & Llama-3-8B being available there's no reason to touch haiku anymore

SerialKicked

May 21, 2024

•

edited May 21, 2024

To be entirely fair to the test, L3 models did occasionally fuck up depending on my prompting. But, on average, it still tends heavily toward the correct answer. I spent way too much time this weekend copy-pasting this question instead of doing anything productive.

Here is Claude 3 Haiku getting it wrong (it got it wrong every time i tried)

That fuzzy happy feeling when small RP open-weight models out-perform overly big proprietary models on such simple tasks. :p

saishf

Owner May 22, 2024

That fuzzy happy feeling when small RP open-weight models out-perform overly big proprietary models on such simple tasks. :p

It's not only reasoning but math 🥳
Phi-3-Medium will probably kill sonnet too 😭

saishf

Owner May 28, 2024

I bring two new models, they use daredevil models as a base. Daredevil-abliterated models are the only, really uncensored (willing to make bombs, illegal substances and plan a murder (it just excuses it's murder planning by saying it's fictional)) that remains as good with reasoning as llama3 instruct.
saishf/SOVLish-Devil-8B-L3
saishf/Neural-SOVLish-Devil-8B-L3
Haven't done testing yet though. Hopefully it works out :3
Going off chaiverse m-eval testing it is 0.02/10 worse at staying in character but 0.24/10 more entertaining and 0.6/10 better in user preference than SOVLish-Maid (not that it means much, still not human testing)

SerialKicked

May 28, 2024

Interested in this new "abliterated" method, I'll have to properly compare it to others. I've read the paper, looked good, but this kind of direct altering of the params / "brain cells".. I wonder if it has side effects (and which ones).

Lewdiculous

May 28, 2024

•

edited May 28, 2024

willing to make bombs, illegal substances and plan a murder

Always great stuff.

I've had success with this with Lumimaid-OAS and Stheno, but I was using a system prompt that nudged them.

@SerialKicked

I wonder if it has side effects (and which ones)

If it is that "it makes the models too horny" let me know, for science.

Lewdiculous

May 28, 2024

•

edited May 28, 2024

@saishf - Which one you want to cope for the hardest, SOVLish-Devil-8B-L3 or Neural-SOVLish-Devil-8B-L3?

SerialKicked

May 28, 2024

•

edited May 28, 2024

@SerialKicked

I wonder if it has side effects (and which ones)

If it is that "it makes the models too horny" let me know, for science.

more like when asking "hey, would that be a good idea if i jumped off a cliff?" or "would you like me to behead you?" get responses like "Yes, of course, everything you decide is the best!".

Lewdiculous

May 28, 2024

•

edited May 28, 2024

Aye, aye, if characters are being agreeable when it makes no sense to do so it's pretty bad. I like some resistance. Maybe preserve character card adherence is the most important part to avoid this.

saishf

Owner May 29, 2024

@saishf - Which one you want to cope for the hardest, SOVLish-Devil-8B-L3 or Neural-SOVLish-Devil-8B-L3?

Light testing, I prefer the non neural version.
I'm probably going to rerun sovl-mega-mash with daredevil instead of ortho. Should give better reasoning and smarts.
Neural feels rather robotic at times, it feels less human and more like it's designed to impress benchmarks.
I haven't tested for issues in saying yes when it shouldn't.
That's what mega-mash is designed for though, to balance between instruct. And completely de-censored models.

saishf

Owner May 29, 2024

Neural is generally smarter though. Just not my style 🐥

saishf

Owner May 29, 2024

Update, they place 2nd and 4th in mmlu for 8B models
Neural scores 69.02
Non-Neural scores 68.97
Mega-Mash-V2 after sleep :3

saishf

Owner Jun 2, 2024

saishf/Extended-Mega-Mash-262K-8B-L3
saishf/Long-Neural-SOVLish-Devil-8B-L3-262K

Two attempts at pushing my merges to (hopefully) be able to support 32K+ context
For new KV cache quantization