GGUF
Not-For-All-Audiences
nsfw
mergekit
Merge
Inference Endpoints
conversational

Interesting model (feedback).

#1
by AtonMountlook - opened

Hi, just been testing this model (q4_0, 8K context),
found it quite interesting using the ChatML prompt, allowing for good "asssitant" interactions, and even when RP it abides quite well to the character, but then it tends to repeat the same phrases indefinitely, regardless of the temperature or CFG. Besides of the initial response for a 2k character taking about 2 min, the following responses are very acceptable in speed (approx 3 token/s), having offloaded only 10 layers to GPU. So I think it is a very well performant model, considering my setup is quite limited (RX6800, 16 GB VRAM + Ryzen 7600x w/64 GB DDR5@6000MT/s).
Under SillyTavern, "Roleplay" preset seems to cause the model to hallucinate in excess and produce extra long and repetitive responses.
When tested as "assistant", it was notorious that there remains a strong censorship within the model, frequently highlighting ethic implications, power imbalances and so on, even if explicitly commanded not to do so.
Considering it is the quantized version, it seems this technique is quite promising.

Thank you for your work, @Undi95 .

Owner

Hello, thanks for the feedback, I tried this to see what would be the reaction mixing a base who accept so much different prompting system, but yeah, even myself don't really find it better that what exist right now. Still, I decided to let it up if people were curious!

Sign up or log in to comment