Lewdiculous/Eris_PrimeV3.05-Vision-7B-GGUF-IQ-Imatrix · General discussion and feedback thread / future of Eris.

Mar 23, 2024

•

edited Mar 23, 2024

If the response is strong enough there may be a longer context version in store for the future. (But i need feedback first.)

Lewdiculous

Owner Mar 23, 2024

•

edited Mar 23, 2024

We'll see about that. At least PrimeV3 was well received, so that counts. I'm just a bit worried about the sacrifices to achieve longer context handling.

Nitral-AI

Mar 23, 2024

Yea, it would be a hassle to work in at this point without sacrificing something.

Lewdiculous

Owner Mar 23, 2024

•

edited Mar 23, 2024

I have formed the opinion now that a really good model, even if limited to --contextsize 8192, is at least at the same level as a just 'decent' model at 10-16K CTX, or even sometimes better, when considering the general chatting experience.

Some models might seemingly handle larger context but then again something has to give, and I think it will be a balancing act of loosing as little as possible in quality. Looking at this form of Eris now, she's a smart girl with multimodal understanding, so I worry about what would have to suffer.

Let me think out loud here a little, so suppose you have a perfectly great model that handles 8K context, would aiming for just a slightly larger context, like 10K, we're talking a 25% increase... Would that still require merging so much of the incoming longtext model or you'd be able to possibly get away with only a relatively small portion, as like 30%?

Just thinking out loud, but I imagine you have an idea from all your experiments so far.

Lewdiculous changed discussion title from Depending. to General discussion and feedback thread / future of Eris. Mar 23, 2024

Lewdiculous pinned discussion Mar 23, 2024

Nitral-AI

Mar 23, 2024

So from my understanding, its seems like getting the weights filtered into every layer is the most important part. So technically i could do a 50/50 merge that puts long context into every layer but it would need to be back merged again with the main model to not lose out super hard on its individual traits, however that should also fall within the 25% range of yarn and 75% base.

Lewdiculous

Owner Mar 23, 2024

•

edited Mar 23, 2024

That 25/75 ratio sounds about right. Interesting for later then. I still need to take time to properly evaluate the models against each other, been busy lately and I have more hobbies than I can afford with everything else happening, haha.

I will surely finish this 50 hour long JRPG soon, surely...

TravelingMan

Mar 23, 2024

•

edited Mar 23, 2024

This model (using the Q8) really starts strong for me, but by the time I get to around 60 turns the prose gets really purple (lots of talk about bonds, waxing poetic about everything), and ellipses start to pollute responses heavily, starting around 6K tokens. By turn 100, even editing out the ellipses everywhere, the messages start looking like A close bond... far from home... where Herp... and Derp... together... begin to... . It drives me nuts, because it starts out soooo good and stays really solid those first 4K tokens or so. I tried the Q6 and got much the same thing.

I usually swap the model to Kunoichi 7b DPO v2 to straighten it out, but it's strange. Never had that happen before. Anyone else see this or know how to suppress it a bit?

Using the Universal Light preset (kbcpp + Silly).

Nitral-AI

Mar 23, 2024

Haven't seen these problems but i usually don't go 60+ messages deep, however ive been running all these models at either 8k or 16k w/o issue.

Lewdiculous

Owner Mar 23, 2024

@Nitral-AI I can also confirm this...

A close bond... far from home... where Herp... and Derp... together... begin to...

After 100 turns, at 8192 context size it was unusable. It is very guilty of that and I felt some of that bondification going on.

It DOES absolutely start very strong, but it decays quite fast.

Lewdiculous

Owner Mar 23, 2024

@TravelingMan - Can you test the previous version, https://huggingface.co/Lewdiculous/Eris_PrimeV3-Vision-7B-GGUF-IQ-Imatrix, and see how that works for you?

Nitral-AI

Mar 23, 2024

Ill have to test it deep into conversation to see if i can replicate the behavior.

Nitral-AI

Mar 23, 2024

•

edited Mar 23, 2024

Moved 0.75 into longtext-test and and just spun up alternative 0.75. Downloading now to quant and compare for long turn conversation. This will also be experimental in nature as i suspect a potentially large hit to overall intelligence.

Lewdiculous

Owner Mar 23, 2024

•

edited Mar 23, 2024

All good. Experimenting is the heart of this. Just do what you enjoy.

I will say models should be tested continuously at least 60 turns in to be sure. I think most people will go at least 80 turns of responses with around 150 tokens into a roleplay session.

Nitral-AI

Mar 23, 2024

•

edited Mar 23, 2024

I think you are being a tad unrealistic with your wants then for an 8192 token context model. 150 tokens at 80 messages is 12,000 tokens

This isnt including, example messages, character description, or any of the user inputs.

Nitral-AI

Mar 23, 2024

•

edited Mar 23, 2024

Went through and tested 3 variants of this (including the unmodified base), 120 messages into conversation with a context value of 16384 and had none of the issues outlined above.

Nitral-AI changed discussion status to closed Mar 23, 2024

Lewdiculous

Owner Mar 23, 2024

Well. Realistically just enough to fill up the context and then some is enough.

Just saying that the issue with V3.05 only manifests after full context and a bit. Now, you could argue that this is expected at some point, and of course after context loss there might be a slow shift in model outputs, but I have been able to hold chats of over 200 responses with the others, so that's why I mention it. It's not usually that bad that early on.

Furthermore, Example messages are still in context as they are Always Included, as such the if the model adheres to the Example Messages section, it shouldn't be drastically swinging into the observed pattern.

The issue is it slowly pivotes into that unwanted pattern of responses with many reticencies over the course of the roleplay.

I hope I can make sense.

Nitral-AI

Mar 23, 2024

•

edited Mar 23, 2024

Examples messages are injected periodically into the context during use which takes up additional tokens... They are not bound to the token usage of the output messages...

Nitral-AI changed discussion status to open Mar 23, 2024

Lewdiculous

Owner Mar 23, 2024

•

edited Mar 23, 2024

Examples messages are injected periodically into the context during use

While that is the default behavior, also as per the Character Card V2 Spec...

I have them Always included in context in a fixed position, you can change that in the User Settings tab. They are there from response #0 to response #999.

Nitral-AI

Mar 23, 2024

Also thank you for informing me how people perceive usable context, very enlightening to witness.

Nitral-AI

Mar 23, 2024

•

edited Mar 23, 2024

So that still doesn't change that fact that they are using additional context, which was my entire point. I dont know why we are getting into semantics about injection methadology.

Lewdiculous

Owner Mar 23, 2024

•

edited Mar 23, 2024

I am fully aware of how the context is used, I also did spend time inspecting the Prompt Itemizer for that reason, and the context sent for inference, for the purposes of also ensuring benefits from KCPP's Context Shifting.

Generally speaking, I will take 1.8K-2K tokens from the 8K budget for System Prompts, User Persona, and Character card, including Example Messages.

Lewdiculous

Owner Mar 23, 2024

•

edited Mar 23, 2024

will take 1.8K-2K tokens from the 8K budget

The comment was related to the fact that I generally didn't experience breaking that badly when you're shifting out old context so early - 250 messages in? I'll take it, but not sub 100, even if both cases are already heavily Shifting in terms of the old context not being present of course, I am fine with that - as there are other guidelines for message formatting adherence, that are respected by other models, even with the context that is left after system prompt and characters have to fit.

Nitral-AI changed discussion status to closed Mar 23, 2024

Nitral-AI changed discussion status to open Mar 23, 2024

Nitral-AI

Mar 23, 2024

•

edited Mar 23, 2024

Well alternative models are up. I'm not going to drive myself insane pleasing everyone's individual use cases.
(both are probably worse)

Lewdiculous

Owner Mar 23, 2024

What I think happens is that since the later parts of the context have more relevance than the section with Example messages, the fact that the responses slowly pivot into the related format causes is to 'self-feed' more of that unwanted style slowly, exacerbating it over time, as more and more of the chat history is filled with it.

The issue is why is it pivoting into that style. Should it remain the same across most messages, it wouldn't be an issue, as it really isn't an issue generally.

I don't mean to just speak badly against V3.05, just sharing what I observed, it's all partially subjective of course, but I mean well.

Nitral-AI

Mar 23, 2024

•

edited Mar 23, 2024

https://huggingface.co/Nitral-AI/Eris_PrimeV3.075-Vision-7B
https://huggingface.co/Nitral-AI/Eris_PrimeV3.075-Vision-7B-Longtext-test
Two different attempts at extending context in different ways. (the alternatives i was referring too)

Lewdiculous

Owner Mar 23, 2024

Well alternative models are up. I'm not going to drive myself insane pleasing everyone's individual use cases.

Hey, hey now, come here, give me a hug mate. This isn't an attack on your efforts, I don't want you to feel like that, alright? Nobody is going to demand you to please them, never take that.

Do what you like for yourself first and foremost, it's okay. While I personally like to also look at other people's experiences, that's just my approach as I believe in getting a result that pleases as many people as possible, without losing track of your own path of course.

Sounds fair?

Nitral-AI

Mar 23, 2024

•

edited Mar 23, 2024

Its just frustrating that i spent an entire day working on an issue i haven't been able to replicate to any degree. Since i cant replicate the issue ive spent even more hours today testing the attempted fixes with no way to validate results.

Lewdiculous

Owner Mar 23, 2024

•

edited Mar 23, 2024

Sorry for causing distress. I understand your frustration, before doing anything like that give it a while to cool down, I know it might be tempting to just get into debug mode and get hyper-focused on something, I do that way too much with my own affairs, but let it cool for a bit, let's hear more from @TravelingMan for instance and let a few rounds of testing go by with additional changes, card formats, system prompts... We'll get there slowly, and you don't need to work yourself out over it, aight? :')

Nitral-AI

Mar 23, 2024

Fair enough, hopefully one of the fixes does it if not. Running Nitral-AI/Eris_PrimeV3.075-Vision-7B-q5_k_m-gguf now for personal use/testing.

Lewdiculous

Owner Mar 23, 2024

•

edited Mar 23, 2024

Shall be tested once I can get past my own addictions. Bloody hells, Dawntrail has a release date I need to speedrun my entire life now.