Undi95/BagelMix-8x7B-GGUF · Interesting model (feedback).

Hi, just been testing this model (q4_0, 8K context),
found it quite interesting using the ChatML prompt, allowing for good "asssitant" interactions, and even when RP it abides quite well to the character, but then it tends to repeat the same phrases indefinitely, regardless of the temperature or CFG. Besides of the initial response for a 2k character taking about 2 min, the following responses are very acceptable in speed (approx 3 token/s), having offloaded only 10 layers to GPU. So I think it is a very well performant model, considering my setup is quite limited (RX6800, 16 GB VRAM + Ryzen 7600x w/64 GB DDR5@6000MT/s).
Under SillyTavern, "Roleplay" preset seems to cause the model to hallucinate in excess and produce extra long and repetitive responses.
When tested as "assistant", it was notorious that there remains a strong censorship within the model, frequently highlighting ethic implications, power imbalances and so on, even if explicitly commanded not to do so.
Considering it is the quantized version, it seems this technique is quite promising.

Thank you for your work, @Undi95 .