Getting run-on sentences at 6k context
The model starts out fine, but as you build up context it begins to run on. It started as I approached the 6k context level. This is a pretty common problem, but I have no idea how to fix it. I'm using the 4 bit per weight exl2 version of this. It looks like this:
Amidst the cacophony of voices echoing against concrete structures encircling your impromptu domicile, you hear whispers intermingled with laughter or groans of frustration derived from myriad trials faced daily by those dwelling alongside society's fringes.
This almost sounds like thesaurus mode, which is usually caused by repetition penalties.
That was it! turned off all rep penalties, and thesaurus mode went away. Now I know why that happens!
With that problem gone, I can explore the adjustments to attention that look so interesting. I really want this model to succeed! I must ask, though, why did you decide to mix refined Chinese, medical information and RP into this model? A bit of an odd mix, don't you think?
Turbo only distributes the model. If I remember correctly, I think it was because Chinese is the second most used language worldwide and the medical is because she studies in medical. You could ask more about it to the creator in the exllama discord server if you'd like
Can you explain or link to something explaining the 'thesaurus mode'?