Repetitive Text

by mlenno1 - opened Dec 19, 2023

Dec 19, 2023

After a few paragraphs, the text gets very repetitive when asked to write a story. I'm using LM Studio with the ChatML preset. Any tips to prevent this?

ehartford

Cognitive Computations org Dec 19, 2023

I notice the same
I think it's a flaw in the model

mlenno1

Dec 19, 2023

Glad to know it's not just me. Was going mad trying everything and couldn't fix it. 😅

Tom9000

Cognitive Computations org Dec 19, 2023

Yes, that's the reason I went back to Yi based models, after less than half hour with mixtral.
I just couldn't get it to stop repeating same thing, even by increasing repetition penalty.

kurnevsky

Dec 20, 2023

Is it only dolphin or other finetunes too? I had the same, but though that dolphin-mixtral might be undertrained on multiturn dialogues. So far dolphin-mistral is much better in keeping the track of dialogue for me.

Tom9000

Cognitive Computations org Dec 20, 2023

Unclear yet, one or another or a bit of both. I'll decided to let all dust to settle, for another month or so, before testing any mixtral models again, models and all the tooling just too raw atm.
But I suspect it will always feel like a group of very knowledgeable but small models, rather than one large model, and interesting cognitive signs tend to show up at around 30B mark for monoliths, so most likely I won't be holding my breath for this specific type of MOE architecture, despite all the hype.

AxiomDEV

Dec 20, 2023

I found in my few hours of testing that the model repeats itself a lot and will get stuck on silly things like a single bracket or when giving a example api key in code, it wont say " {INSERT_API_KEY_HERE} " it will try and create one, a never ending one 🤣 and i also found that it apologies all the time, esspecally when i just dump my error codes for it to help me with issues with no context 🤷‍♂️. That's my 10p anyway, i don't know enough about this stuff to even start contemplating why it dose that as more than likely it's probably just me .

hushpiper

Dec 21, 2023

I've been looking into this issue but I've had some trouble really pinning it down to figure out. Every model I've tried on this MoE architecture tends to loop, finetunes far moreso than the original, and the commenter above is correct that repetition penalty tends to make it worse. I suspect it can be addressed, at least partially, with sampler settings. Try raising temperature substantially, somewhere between 1 and 1.5, and switch to a dynamic sampler. Min P around 0.1 gets pretty consistently good results for me, and Mirostat is also very good. (Not quite as good as Min P in terms of stability and comprehension, but Min P isn't available everywhere while Mirostat usually is.) Other sampler settings I've tried tend to be pretty meh.

Please let me know if these settings help with the issue! As I said, I've been looking into it, so more data would help.

ehartford

Cognitive Computations org Dec 21, 2023

I would love to get to the bottom of this.

mlenno1

Dec 21, 2023

It happens with the base Mixtral model as well. I've been seeing people reporting issues with the K quants, but I tested all q4 and q5 versions of the base Mixtral and this fine-tuned model and experience similar results. Short answers to direct questions work really well, but asking it to write a story or simulate dialogue almost always results in repetition and loss of coherence.

gghfez

Dec 21, 2023

I found in my few hours of testing that the model repeats itself a lot and will get stuck on silly things like a single bracket or when giving a example api key in code, it wont say " {INSERT_API_KEY_HERE} " it will try and create one, a never ending one 🤣 and i also found that it apologies all the time, esspecally when i just dump my error codes for it to help me with issues with no context 🤷‍♂️. That's my 10p anyway, i don't know enough about this stuff to even start contemplating why it dose that as more than likely it's probably just me .

Do you have an example to produce this? I'm not having this issue.

ehartford

Cognitive Computations org Dec 21, 2023

It happens with the base Mixtral model as well. I've been seeing people reporting issues with the K quants, but I tested all q4 and q5 versions of the base Mixtral and this fine-tuned model and experience similar results. Short answers to direct questions work really well, but asking it to write a story or simulate dialogue almost always results in repetition and loss of coherence.

Exactly true. Asking it to generate longer stories is giving terrible results even at 16bit so quantization isn't the problem

Modeledbehavior

Dec 21, 2023

I wonder if Classifier Free Guidance would help.

Tom9000

Cognitive Computations org Dec 22, 2023

Looks like, at least partially, issue is with the Mixtral MOE architecture.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment