General discussion.

pinned

by Lewdiculous - opened Mar 3

Owner Mar 3

    quantization_options = [
        "Q4_K_M", "Q4_K_S", "IQ4_NL", "IQ4_XS", "Q5_K_M", 
        "Q5_K_S", "Q6_K", "Q8_0", "IQ3_M", "IQ3_S", "IQ3_XS", "IQ3_XXS"
    ]

Lewdiculous pinned discussion Mar 3

Morktastic

Mar 8

•

edited Mar 8

Truly all refusals have been removed from this model....

Based model. I was gonna recommend you tried this one, given you encountered some refusals in the previous Eris, there is a new merge of these, using DPO - ChaoticNeutrals/Eris_Floramix_DPO_7B, might by worth trying, I'll do quants later, I'm also testing if different inference matrix datas (adding more RP and NSFW RP data to the imatrix.txt with the usual roleplay formatting) could help with messages styling consistency.

Lewdiculous

Owner Mar 8

•

edited Mar 8

@Morktastic

https://huggingface.co/Lewdiculous/Layris_9B-GGUF-IQ-Imatrix

You could try this one too, it's a mix of Eris and Layla, the idea when I requested it was to mix the high performance of Eris, but with the un-alignment/less refusals from Layla.

It is slightly bigger but you can get it to use the same VRAM as a 7B Q5_K_M by using the 9B Q4_K_S.

Slightly bigger size might also be slightly "smarter". Not guaranteed but it is, technically.

Morktastic

Mar 9

im working with 6gb vram, so 7b Q4_K_M with high context is more or less the limit for cuda only :(

Lewdiculous

Owner Mar 9

@Morktastic Totally understand you, I also prefer higher context especially since I use Context Shifting from Koboldcpp to speed things up a lot, I prefer having more memory of the conversation instead of using Lorebooks, etc.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment