General discussion and feedback thread / future of Eris.

#1
by Nitral-AI - opened

If the response is strong enough there may be a longer context version in store for the future. (But i need feedback first.)

We'll see about that. At least PrimeV3 was well received, so that counts. I'm just a bit worried about the sacrifices to achieve longer context handling.

Yea, it would be a hassle to work in at this point without sacrificing something.

  1. I have formed the opinion now that a really good model, even if limited to --contextsize 8192, is at least at the same level as a just 'decent' model at 10-16K CTX, or even sometimes better, when considering the general chatting experience.

  1. Some models might seemingly handle larger context but then again something has to give, and I think it will be a balancing act of loosing as little as possible in quality. Looking at this form of Eris now, she's a smart girl with multimodal understanding, so I worry about what would have to suffer.

  1. Let me think out loud here a little, so suppose you have a perfectly great model that handles 8K context, would aiming for just a slightly larger context, like 10K, we're talking a 25% increase... Would that still require merging so much of the incoming longtext model or you'd be able to possibly get away with only a relatively small portion, as like 30%?

Just thinking out loud, but I imagine you have an idea from all your experiments so far.

Lewdiculous changed discussion title from Depending. to General discussion and feedback thread / future of Eris.
Lewdiculous pinned discussion

So from my understanding, its seems like getting the weights filtered into every layer is the most important part. So technically i could do a 50/50 merge that puts long context into every layer but it would need to be back merged again with the main model to not lose out super hard on its individual traits, however that should also fall within the 25% range of yarn and 75% base.

That 25/75 ratio sounds about right. Interesting for later then. I still need to take time to properly evaluate the models against each other, been busy lately and I have more hobbies than I can afford with everything else happening, haha.

I will surely finish this 50 hour long JRPG soon, surely...

This model (using the Q8) really starts strong for me, but by the time I get to around 60 turns the prose gets really purple (lots of talk about bonds, waxing poetic about everything), and ellipses start to pollute responses heavily, starting around 6K tokens. By turn 100, even editing out the ellipses everywhere, the messages start looking like A close bond... far from home... where Herp... and Derp... together... begin to... . It drives me nuts, because it starts out soooo good and stays really solid those first 4K tokens or so. I tried the Q6 and got much the same thing.

I usually swap the model to Kunoichi 7b DPO v2 to straighten it out, but it's strange. Never had that happen before. Anyone else see this or know how to suppress it a bit?

Using the Universal Light preset (kbcpp + Silly).

Haven't seen these problems but i usually don't go 60+ messages deep, however ive been running all these models at either 8k or 16k w/o issue.

@Nitral-AI I can also confirm this...

A close bond... far from home... where Herp... and Derp... together... begin to...

After 100 turns, at 8192 context size it was unusable. It is very guilty of that and I felt some of that bondification going on.

It DOES absolutely start very strong, but it decays quite fast.

@TravelingMan - Can you test the previous version, https://huggingface.co/Lewdiculous/Eris_PrimeV3-Vision-7B-GGUF-IQ-Imatrix, and see how that works for you?

Ill have to test it deep into conversation to see if i can replicate the behavior.

Moved 0.75 into longtext-test and and just spun up alternative 0.75. Downloading now to quant and compare for long turn conversation. This will also be experimental in nature as i suspect a potentially large hit to overall intelligence.

All good. Experimenting is the heart of this. Just do what you enjoy.

I will say models should be tested continuously at least 60 turns in to be sure. I think most people will go at least 80 turns of responses with around 150 tokens into a roleplay session.

I think you are being a tad unrealistic with your wants then for an 8192 token context model. 150 tokens at 80 messages is 12,000 tokens

This isnt including, example messages, character description, or any of the user inputs.

Went through and tested 3 variants of this (including the unmodified base), 120 messages into conversation with a context value of 16384 and had none of the issues outlined above.

Nitral-AI changed discussion status to closed

Well. Realistically just enough to fill up the context and then some is enough.

Just saying that the issue with V3.05 only manifests after full context and a bit. Now, you could argue that this is expected at some point, and of course after context loss there might be a slow shift in model outputs, but I have been able to hold chats of over 200 responses with the others, so that's why I mention it. It's not usually that bad that early on.

Furthermore, Example messages are still in context as they are Always Included, as such the if the model adheres to the Example Messages section, it shouldn't be drastically swinging into the observed pattern.

The issue is it slowly pivotes into that unwanted pattern of responses with many reticencies over the course of the roleplay.

I hope I can make sense.

Examples messages are injected periodically into the context during use which takes up additional tokens... They are not bound to the token usage of the output messages...

Nitral-AI changed discussion status to open

Examples messages are injected periodically into the context during use

While that is the default behavior, also as per the Character Card V2 Spec...

I have them Always included in context in a fixed position, you can change that in the User Settings tab. They are there from response #0 to response #999.

Also thank you for informing me how people perceive usable context, very enlightening to witness.

So that still doesn't change that fact that they are using additional context, which was my entire point. I dont know why we are getting into semantics about injection methadology.

I am fully aware of how the context is used, I also did spend time inspecting the Prompt Itemizer for that reason, and the context sent for inference, for the purposes of also ensuring benefits from KCPP's Context Shifting.

Generally speaking, I will take 1.8K-2K tokens from the 8K budget for System Prompts, User Persona, and Character card, including Example Messages.

will take 1.8K-2K tokens from the 8K budget

The comment was related to the fact that I generally didn't experience breaking that badly when you're shifting out old context so early - 250 messages in? I'll take it, but not sub 100, even if both cases are already heavily Shifting in terms of the old context not being present of course, I am fine with that - as there are other guidelines for message formatting adherence, that are respected by other models, even with the context that is left after system prompt and characters have to fit.

Nitral-AI changed discussion status to closed
Nitral-AI changed discussion status to open

Well alternative models are up. I'm not going to drive myself insane pleasing everyone's individual use cases.
(both are probably worse)

What I think happens is that since the later parts of the context have more relevance than the section with Example messages, the fact that the responses slowly pivot into the related format causes is to 'self-feed' more of that unwanted style slowly, exacerbating it over time, as more and more of the chat history is filled with it.

The issue is why is it pivoting into that style. Should it remain the same across most messages, it wouldn't be an issue, as it really isn't an issue generally.

I don't mean to just speak badly against V3.05, just sharing what I observed, it's all partially subjective of course, but I mean well.

https://huggingface.co/Nitral-AI/Eris_PrimeV3.075-Vision-7B
https://huggingface.co/Nitral-AI/Eris_PrimeV3.075-Vision-7B-Longtext-test
Two different attempts at extending context in different ways. (the alternatives i was referring too)

Well alternative models are up. I'm not going to drive myself insane pleasing everyone's individual use cases.

Hey, hey now, come here, give me a hug mate. This isn't an attack on your efforts, I don't want you to feel like that, alright? Nobody is going to demand you to please them, never take that.

Do what you like for yourself first and foremost, it's okay. While I personally like to also look at other people's experiences, that's just my approach as I believe in getting a result that pleases as many people as possible, without losing track of your own path of course.

Sounds fair?

Its just frustrating that i spent an entire day working on an issue i haven't been able to replicate to any degree. Since i cant replicate the issue ive spent even more hours today testing the attempted fixes with no way to validate results.

Sorry for causing distress. I understand your frustration, before doing anything like that give it a while to cool down, I know it might be tempting to just get into debug mode and get hyper-focused on something, I do that way too much with my own affairs, but let it cool for a bit, let's hear more from @TravelingMan for instance and let a few rounds of testing go by with additional changes, card formats, system prompts... We'll get there slowly, and you don't need to work yourself out over it, aight? :')

Fair enough, hopefully one of the fixes does it if not. Running Nitral-AI/Eris_PrimeV3.075-Vision-7B-q5_k_m-gguf now for personal use/testing.

Shall be tested once I can get past my own addictions. Bloody hells, Dawntrail has a release date I need to speedrun my entire life now.

Sign up or log in to comment