Hoping someone can finally help me with this

#8
by ElvisM - opened

Anybody else here having the same problem where the model works well for the first message, and then it instantly starts to give gibberish for no reason in the second message? It looks like such a weird bug with llama.cpp. It happens with all GLM 4.7 models, which is sad because it's such a good LLM. It looks like reloading fixes it, but I'm surprised this model's implementation is still broken.

Owner

Quant? Ai APP used? Settings ?

Quant? Ai APP used? Settings ?

Hi, I'm using Q4KM, Oobabooga's TextGen Web UI, and some GLM 4.7's settings that I found for SillyTavern somewhere on Google. It has repetition penalty disabled, as is recommended.

Owner

Testing was done in LMSTudio ; if memory serves you can select the template to use in "TextGen".
This is CRITICAL.

RE: Settings; start with GLM 4.7 Flash settings/samplers.
Silly tavern sampler - there are tonnes of these - may be the root issue ; specially "Dry", and other advanced ones.

Testing was done in LMSTudio ; if memory serves you can select the template to use in "TextGen".
This is CRITICAL.

RE: Settings; start with GLM 4.7 Flash settings/samplers.
Silly tavern sampler - there are tonnes of these - may be the root issue ; specially "Dry", and other advanced ones.

Thanks. I've also tested the model only in Oobabooga (no SillyTavern) and the same problem still occurred. I've stopped using it for now because of this, so I don't know if anything has changed. Oobabooga just uses whatever template already comes bundled with the model. I just find it so weird that the first few messages are so good and then the model breaks, and reloading it seems to fix it.

Owner

Based on your description; this could be a template glitch and/or a caching issue.
Older models are a lot more fault tolerant with Jinja template issues; where as newer ones can "drop the ball" if a few tokens are added at the wrong time.

Sign up or log in to comment