Text Generation
Transformers
PyTorch
English
mixtral
conversational
Inference Endpoints
text-generation-inference

Repetitive Text

#8
by mlenno1 - opened

After a few paragraphs, the text gets very repetitive when asked to write a story. I'm using LM Studio with the ChatML preset. Any tips to prevent this?

Cognitive Computations org

I notice the same
I think it's a flaw in the model

Glad to know it's not just me. Was going mad trying everything and couldn't fix it. πŸ˜…

Cognitive Computations org

Yes, that's the reason I went back to Yi based models, after less than half hour with mixtral.
I just couldn't get it to stop repeating same thing, even by increasing repetition penalty.

Is it only dolphin or other finetunes too? I had the same, but though that dolphin-mixtral might be undertrained on multiturn dialogues. So far dolphin-mistral is much better in keeping the track of dialogue for me.

Cognitive Computations org

Unclear yet, one or another or a bit of both. I'll decided to let all dust to settle, for another month or so, before testing any mixtral models again, models and all the tooling just too raw atm.
But I suspect it will always feel like a group of very knowledgeable but small models, rather than one large model, and interesting cognitive signs tend to show up at around 30B mark for monoliths, so most likely I won't be holding my breath for this specific type of MOE architecture, despite all the hype.

I found in my few hours of testing that the model repeats itself a lot and will get stuck on silly things like a single bracket or when giving a example api key in code, it wont say " {INSERT_API_KEY_HERE} " it will try and create one, a never ending one 🀣 and i also found that it apologies all the time, esspecally when i just dump my error codes for it to help me with issues with no context πŸ€·β€β™‚οΈ. That's my 10p anyway, i don't know enough about this stuff to even start contemplating why it dose that as more than likely it's probably just me .

I've been looking into this issue but I've had some trouble really pinning it down to figure out. Every model I've tried on this MoE architecture tends to loop, finetunes far moreso than the original, and the commenter above is correct that repetition penalty tends to make it worse. I suspect it can be addressed, at least partially, with sampler settings. Try raising temperature substantially, somewhere between 1 and 1.5, and switch to a dynamic sampler. Min P around 0.1 gets pretty consistently good results for me, and Mirostat is also very good. (Not quite as good as Min P in terms of stability and comprehension, but Min P isn't available everywhere while Mirostat usually is.) Other sampler settings I've tried tend to be pretty meh.

Please let me know if these settings help with the issue! As I said, I've been looking into it, so more data would help.

Cognitive Computations org

I would love to get to the bottom of this.

It happens with the base Mixtral model as well. I've been seeing people reporting issues with the K quants, but I tested all q4 and q5 versions of the base Mixtral and this fine-tuned model and experience similar results. Short answers to direct questions work really well, but asking it to write a story or simulate dialogue almost always results in repetition and loss of coherence.

I found in my few hours of testing that the model repeats itself a lot and will get stuck on silly things like a single bracket or when giving a example api key in code, it wont say " {INSERT_API_KEY_HERE} " it will try and create one, a never ending one 🀣 and i also found that it apologies all the time, esspecally when i just dump my error codes for it to help me with issues with no context πŸ€·β€β™‚οΈ. That's my 10p anyway, i don't know enough about this stuff to even start contemplating why it dose that as more than likely it's probably just me .

Do you have an example to produce this? I'm not having this issue.

Cognitive Computations org

It happens with the base Mixtral model as well. I've been seeing people reporting issues with the K quants, but I tested all q4 and q5 versions of the base Mixtral and this fine-tuned model and experience similar results. Short answers to direct questions work really well, but asking it to write a story or simulate dialogue almost always results in repetition and loss of coherence.

Exactly true. Asking it to generate longer stories is giving terrible results even at 16bit so quantization isn't the problem

I wonder if Classifier Free Guidance would help.

Cognitive Computations org

Looks like, at least partially, issue is with the Mixtral MOE architecture.

Sign up or log in to comment