General discussion.
quantization_options = [
"Q4_K_M", "Q4_K_S", "IQ4_NL", "IQ4_XS", "Q5_K_M",
"Q5_K_S", "Q6_K", "Q8_0", "IQ3_M", "IQ3_S", "IQ3_XS", "IQ3_XXS"
]
Truly all refusals have been removed from this model....
Based model. I was gonna recommend you tried this one, given you encountered some refusals in the previous Eris, there is a new merge of these, using DPO - ChaoticNeutrals/Eris_Floramix_DPO_7B, might by worth trying, I'll do quants later, I'm also testing if different inference matrix datas (adding more RP and NSFW RP data to the imatrix.txt with the usual roleplay formatting) could help with messages styling consistency.
https://huggingface.co/Lewdiculous/Layris_9B-GGUF-IQ-Imatrix
You could try this one too, it's a mix of Eris and Layla, the idea when I requested it was to mix the high performance of Eris, but with the un-alignment/less refusals from Layla.
It is slightly bigger but you can get it to use the same VRAM as a 7B Q5_K_M by using the 9B Q4_K_S.
Slightly bigger size might also be slightly "smarter". Not guaranteed but it is, technically.
im working with 6gb vram, so 7b Q4_K_M with high context is more or less the limit for cuda only :(
@Morktastic Totally understand you, I also prefer higher context especially since I use Context Shifting from Koboldcpp to speed things up a lot, I prefer having more memory of the conversation instead of using Lorebooks, etc.