Devolution into semi-nonsensical long words

#1
by jspr - opened

This model is incredible. It's simply the best out there for the task for which it was designed. However, I've found that after a while (maybe a few hundred to a few thousand tokens in), it can tend to devolve into consecutive long descriptive words with no punctuation. What might this be an artifact of? I've found that turning down the repetition penalty to 1 from tgui's default of 1.15 can help.

Interesting. Can you post your generation settings? Which quantization were you testing?

I've definitely tested many generations up to the 32K limit and haven't found that particular issue. Is it possible for you to share an example so I can replicate it?

I often test with the following settings. It is true that I did not go to a very high rep penalty (but 1.15 doesn't seem that high). Does what you describe happen with mirostat as well?

Standard sampling:

'temperature': 0.7-0.8
'top_p': 0.6
'min_p': 0
'top_k': 40
'repetition_penalty': 1.12
'presence_penalty': 0
'frequency_penalty': 0
'repetition_penalty_range': 1024
'typical_p': 1
'tfs': 1
'top_a': 0

Mirostat: (Oobabooga settings)

'mirostat_mode': 2 (set to 1 if using GGUF)
'mirostat_tau': 1.5 to 2
'mirostat_eta': 0.1

with the other settings set to defaults:

'temperature': 1
'top_p': 1
'min_p': 0
'top_k': 0
'repetition_penalty': 1
'presence_penalty': 0
'frequency_penalty': 0
'repetition_penalty_range': 1024
'typical_p': 1
'tfs': 1
'top_a': 0

Super helpful, thanks! Here are the settings I've been using - they're mostly the text-generation-webui defaults.

temperature 0.7
top_p 0.9
top_k 20
min_p 0
repetition penalty 1.15
frequency and presence penalties 0
repetition penalty range 1024
guidance scale 1
mirostat mode 0
mirostat tau 5
mirostat eta 0.1
typical_p 1
tfs 1
top_a 0

for model loading, I've got compress_pos_emb at 8.

The only major differences are top_p, top_k, mirostat mode, and mirostat tau, I think. I'm not super familiar with Mirostat, but maybe 5 is too high? or maybe mirostat mode == 0 means it's off entirely?

I've also mostly been using the elx2 quants, usually the 4 bpw one, but I doubt that was introducing performance problems.

Yeah mirostat mode = 0 implies it is off. You could try picking the default mirostat preset in the dropdown, and then change tau to 1.5-2 (5 is meant for smaller/dumber models).

Also, are you using the included Aurelian.yaml for the prompt template (or maybe manually set up the prompt format)?

Edit: Just tested some generations that were ~4k tokens in length with the generation settings you used and it seemed fine, so that can't be that wrong.

Yep, I made sure that the prompt template was the same as Aurelian.yaml. Sounds like it's likely to be a mirostat thing, I'll re-run with that enabled and the config that you suggested above. It definitely worked like 95% of the time, so not a huge blocker or anything. Thanks @grimulkan !

Great! Do let me know if you can consistently trigger that 5% that doesn't work though. It might be something I can fix in a later finetune.

Yep, looks like enabling mirostat mode solves it up through at least 8k tokens. Thanks for the help!

jspr changed discussion status to closed

Sign up or log in to comment