[Part 2] General discussion.

#2
by Lewdiculous - opened

[Part 2]

@Nitral-AI @ABX-AI

Continue here, the other discussion got too long to scroll comfortably.

Lewdiculous pinned discussion

0_2.webp

You're getting closer. Tho I think LocalBasedMan really prefers regular anime Sumika haha.

Hahah, yeah this was just for you to thank for new thread :3
I would never force 3D over 2D anime enjoyers!! I'm super happy just using the style for my merges.
https://huggingface.co/ABX-AI/Laymonade-7B - I think this one came out alright! now to go for quants, then 9b ^^

BTW @Lewdiculous I did do a lot of 2D anime at the start of my ai imagery journey but quickly got out of it as it was too good to even build on

https://www.deviantart.com/abroxis/gallery/87178213/crimson-petals-of-the-azalea-dynasty
https://www.deviantart.com/abroxis/gallery/87165704/historical-figures-of-the-future
https://www.deviantart.com/abroxis/gallery/87176370/the-celestial-sisterhood-mothers-of-all-life

It's just automatically perfect, pretty much. MJ's niji algorithm is insane. I went into 3D because I felt it had far more space for stylizing and actually coming up with new styles so to speak

Wait shouldnt the config be like this? @ABX-AI

{
  "_name_or_path": "example/example",
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-05,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 32000
}

im seeing this in laymonade:

{
  "_name_or_path": "KatyTheCutie/LemonadeRP-4.5.3",
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-05,
  "rope_theta": 10000.0,
  "sliding_window": 4096,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.39.1",
  "use_cache": true,
  "vocab_size": 32000
}

Is that for mistral-v0.2- layla?

I gave up on trying to use it as base.

Just copy the config from mistralai/Mistral-7B-Instruct-v0.2 or alpindale/Mistral-7B-v0.2-hf

@Virt-io no thats the official config from instruct 0.2, but it should still work on layla 0.2 i would assume.

This is the config for layla 0.2 that they provide which has the added chatml tokens into the vocab.

{
  "_name_or_path": "/home/layla/src/text-generation-webui/models/mistral-7b-v0.2",
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pad_token_id": 0,
  "rms_norm_eps": 1e-05,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": false,
  "vocab_size": 32002
}

This is also part of the problem:

image.png

Is it the sliding window? I have no idea why. Is it a good idea to use a different config? Also, Lemonade is the base on Laymonade, it was giving me errors the other way around that I could not fix in any of the ways you suggested

"sliding_window": null

This is for Mistral-v0.2

Also - "rope_theta": 1000000.0

Does it mean this model is considered 4k context?
So the fix is to delete the added_tokens from the mistral v2 layla ?

Guess i need to do a merge with layla 0.2 as the base for educational purposes after this model decides to upload ffs why does collab hate me today.

Should probably take a look at Locutusque/Hercules-4.0-Mistral-v0.2-7B it also uses ChatML, but has no added tokens.

idk why people are configuring it with a swa of 32k when it doesn't have it at all...

That is because alpindale/Mistral-7B-v0.2-hf was configured incorrectly and people trained on it without checking the configs. It was recently fixed though.

Oh, im using the official config for 0.2 instruct:

{
  "_name_or_path": "example/example",
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-05,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 32000
}

So the fix is to paste this config in the layla config file, and remove the added tokens json? mergekit loves to add deleted files back, that's super annoying... I made it blank and read-only, completely just to test

@ABX-AI
I would just use a different model as base, there is this one Undi95/LewdMistral-7B-0.2.

The reason is that I changed a bunch of files and I would still get tensor mismatches. l3utterfly/mistral-7b-v0.2-layla-v4 is good as long as it is not base.

That's the way I went about it with this model: https://huggingface.co/ABX-AI/Laymonade-7B - since @Nitral-AI is saying he can use it as base, I tried again but it keeps failing for me after messing around with config/removing added tokens in the layla model.

But then what is the correct way to fix the config that is apparently wrong in Laymonade?

Wait, I think I got it to work as base by forcing the added tokens to be empty (so mergekit won't keep re-downloading) and pasting the mistral 0.2 config in layla)

Oh, just update the config like i suggested in laymonade and requant.

Should just be able to replace the entire config with whats below w/o issue.

{
  "_name_or_path": "ABX-AI/Laymonade-7B",
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-05,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 32000
}

@Nitral-AI I think it works, I followed your steps wrong, making added tokens a blank read-only worked, config of laymonade now looks like this:

{
  "_name_or_path": "l3utterfly/mistral-7b-v0.2-layla-v4",
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-05,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.39.1",
  "use_cache": true,
  "vocab_size": 32000
}

Making a Q4_K_M to test.

On a side note: this exists as well.
normalize: true

@Nitral-AI @ABX-AI

Found this video: https://www.youtube.com/watch?v=cvOpX75Kz4M

Very helpful in understanding what the merge methods do.

On a side note: this exists as well.
normalize = true

Are you saying it should be added to the config of the base or maybe the merged output? I haven't noticed it in the previous configs mentioned.
BTW Thanks a megaton for the help! However, the Quant came out with no gpu layers possible :s I had a similar issue with another merge I did where gpu offload was not available.
Any clues how to fix that one? ^^

The reason layla doesn't work is because it adds tokens in other configs not just added_tokens.json.

Indeed the two start/end ones are noted in the tokenizer_config. But no offloading possible, is it related to that?

Wait you cant offload the quant with the updated config?

Yes, if I recall correctly they are also added in tokenizer.model.

Wait you cant offload the quant with the updated config?

I think he is talking about using layla as base in a merge.

Wait you cant offload the quant with the updated config?

Yeah, and this happened to me with one of the first merges I did, so I assumed it was a wrong model combination. Otherwise, the merge was possible and the quant came out, but this problem still makes it unusable as gguf.

Huh, im using that exact config for 4.0. And now 4.20 without issue.

Do you get any errors/ warning during the merge?

Wait you cant offload the quant with the updated config?

UGH. What the hell. I re-loaded it in Kobold and now it shows layers!
Previously, this bug happened in LLM and it was not offering layers.

OK GUYS, layla as base works :D

Vindication, i was trying to get mine uploaded but collab hates me today.

This is how i fixed using noromaid 0.4 as base forever ago when i first started.

Awesome, thanks a lot @Nitral-AI , now I'll look into making it infinitely 9b
BTW, the second run it came out with 200 layers suggested in Kobold. I notice some models offer 200, others offer a specific default value like 8, 16, or whatever.
Any idea how one could control this default value? When it has a default value, often is pretty close to what I would choose to have headroom, so maybe it is auto-assumed in some way by kobold but I have no clue tbh. Maybe smth I can add or change in the configs or GGUF-ing?
(re-uploading laymonade new variant so it has the new base)

BTW, the second run it came out with 200 layers suggested in Kobold. I notice some models offer 200, others offer a specific default value like 8, 16, or whatever.
Any idea how one could control this default value? When it has a default value, often is pretty close to what I would choose to have headroom, so maybe it is auto-assumed in some way by kobold but I have no clue tbh. Maybe smth I can add or change in the configs or GGUF-ing?

Will investigate when i have a moment here.

No worries at all @Nitral-AI . I'm curious if it makes sense to use the new laymonade + infinityRP for 9b, or it would be better to passthrough with laymonade and layla as you suggested, but I may try both and test.

Here me out, slerp layla + infinity rp as 7b. Then dare-ties (layla/inifinity rp) + laymonade into another 7b. Finally, passthrough (layla/infinityrp/laymonade) + layla to 9B. @ABX-AI

Here me out, slerp layla + infinity rp as 7b. Then dare-ties (layla/inifinity rp) + laymonade into another 7b. Finally, passthrough (layla/infinityrp/laymonade) + layla to 9B. @ABX-AI

I see plan, I follow plan

@Nitral-AI Can you provide a good dare-tiers example (for yaml file)? I think I have just been doing ties alone so far?

models:
  - model: ./extra_hdd2/Mixtral-8x7B-Instruct-v0.1
    parameters:
      density: 0.5
      weight: 1.0
  - model: ./extra_hdd/Mixtral-8x7B-v0.1-LimaRP-ZLoss
    parameters:
      density: 0.5
      weight: 0.5
merge_method: dare_ties
base_model: ./extra_hdd/Mixtral-8x7B-v0.1
parameters:
  #normalize: false
  #int8_mask: true
dtype: bfloat16

I suppose I'll follow this one (using the appropriate models ofc) but not sure about weights. Also you said normalize=true ?

@ABX-AI This should be along the lines of what you need.

merge_method: dare_ties
base_model: example/model_1
parameters:
  normalize: true
models:
  - model: example/model_2
    parameters:
      weight: 1
  - model :example/model_1
    parameters:
      weight: 1
dtype: float16

@Nitral-AI it worked, I followed the steps, but it seems that sometimes this model is prone to repeat something endlessly, it's not too often but often enough to notice. Any ideas on how that can be fixed? Overall, the output is much worse than expected, so I either messed it up, or need to try another config.

Go back to where you "Then dare-ties (layla/inifinity rp) + laymonade into another 7b" and use that as the passthough, could be too much layla.

slices:
  - sources:
      - model: "Then dare-ties (layla/inifinity rp) + laymonade into another 7b" 
        layer_range: [0, 20]
  - sources:
      - model: "Then dare-ties (layla/inifinity rp) + laymonade into another 7b" 
        layer_range: [12, 32]
merge_method: passthrough
dtype: float16

Sadly, I think the problem comes from using layla as base, because the new Laymonade I made has the same issue with repetition (however far less common). I'm gonna try reversing the bases again to see if it will be any better. Removing the start/end special tokens in the gguf may be a problem ultimately? However Laymonade is giving me some insane stories so i'm gonna try to make it work as best as I can and get to a decent 9b

New strategy, with much better results:

slices:
  - sources:
      - model: KatyTheCutie/LemonadeRP-4.5.3
        layer_range: [0, 32]
      - model: Nitral-AI/Infinitely-Laydiculous-7B
        layer_range: [0, 32]
merge_method: slerp
base_model: Nitral-AI/Infinitely-Laydiculous-7B
parameters:
  t:
    - filter: self_attn
      value: [0.7, 0.3, 0.6, 0.2, 0.5]
    - filter: mlp
      value: [0.3, 0.7, 0.4, 0.8, 0.5]
    - value: 0.5
dtype: bfloat16

And then

slices:
  - sources:
      - model: Nitral-AI/Infinitely-Laydiculous-7B
        layer_range: [0, 20]
  - sources:
      - model: ./MODELS/Infinite-Laymons-7B
        layer_range: [12, 32]
merge_method: passthrough
dtype: float16

I think I sort of got what I wanted with more diverse and interesting responses, and no refusal. I am likely messing up the original layla somehow, so I used lydiculous-7B as you did the job properly there @Nitral-AI ^^
Some of the merges I uploaded do tend to give refusals far more, so I think I'll purge them at some point when I have more time to test properly and see what I should leave. Kuno-Lemons also refuses quite a bit for some reason.

I feel that, i had like 65 models up at one point, deleted far more than i have left posted. Glad it worked out somewhat at least in the end.

@Nitral-AI @Lewdiculous
image.png
Told u it's the best :D With the merges I am doing, I'm really tryina get to this model + way more brutal fetishistic style 😇 (well, and better character adherence + lower repetition and asking "are you ready")

200 battles is not enough to tell champ. ;) Appreciate the model exposure, but i already put it up 3 times and watched it get slapped.

Noooo I spoke too soon. Well, if it helps, I don't think this RP board translates to which model is practically the best. All benchmarks inevitably turn into a game of chasing the numbers of the bench, I do tests with my cards which are familiar and it tells me far more, the rest is how dumb the model is, does it follow well, does it repeat itself too much, etc. Especially because a good RP model starts shining when you hold a conversation with a longer interaction, which basically means you need a human to eval it imho.

In my tests this model outmatches the KukulStanta-7B one

Do you know how chai does there elo?

image.png

From what I imagined, the "double thumbs up" (normal) is human rating? Then you have the same for reward repo, so if you arrange them by just Double Thumbs up, you will see a different list.
There is also this "by repo" double likes, which I don't understand, but a couple of my merges scored well there: (is it counting the actual likes on HF for the whole repository?)

image.png

Then, this is just "double tumbs up" ratio:

image.png

Overall, for role-play specifically, I think synthetic scoring is not accurate, the reward model doesn't know what good kink is, unless it's claude-3 or smth like that... which it isn't ^^

Oh btw, it re-slipped back into first. I was going to work on merges today, but now i may just play games instead.

Overall, for role-play specifically, I think synthetic scoring is not accurate, the reward model doesn't know what good kink is, unless it's claude-3 or smth like that... which it isn't ^^

Which is why i appreciate the battles here are determined by the actual end users (elo) and not by the reward model.

Overall, for role-play specifically, I think synthetic scoring is not accurate, the reward model doesn't know what good kink is, unless it's claude-3 or smth like that... which it isn't ^^

Which is why i appreciate the battles here are determined by the actual end users (elo) and not by the reward model.

Ooh ok, so the ELO is mosly user-based? That's pretty good, then, I thought it weighs in the bot battles a lot more. See how well-spread out the battles are? I'm curious how that can be achieved, those users must be wanking off furiously 24/7 hahaah

I actually hadn't looked into how massive that platform is... yeah, it makes sense now ^^ They could easily do this with users all the way

@ABX-AI

those users must be wanking off furiously 24/7 hahaah

KEKW

I'll say the Chai users are special

Do you know how chai does there elo?

They mentioned a few days ago in the announcement for the new elo system that there will be a write-up for it coming soon.
So i guess we're stuck waiting for now

I was contacted by Chai-Will, and did ask about it, and the answer was Elo is a human score. I also gave feedback that it would be nice to have an info panel with more explanations on what each ranking column does, and I understood the feedback will be taken in consideration, so that was definitely great to hear :)

Sign up or log in to comment