Work of art!

#1
by nixudos - opened

Thank you for this model! I was looking for a mix exacly like this and thought a Pymalion Lora might be the only solution, but this blend seems to do what I was loking for.
I only have 12 gb VRAM so it runs a bit slower than the 4bit 128g models I have played with, but the result is much more coherent so far.
Do you know if a 4bit 128g model would lose a lot of it's fidelity?

I've seen people mention having a noticeable improvement on response quality depending on quantization. I plan to quantize these mixes to 4bit with a 32 group size, which peaks a little below 10GB VRAM usage at max context, so hopefully it maintains more of it's big brother's coherence while still being usable on most cards.

By the way, any feedback into how this model feels compared to base Pygmalion-7b, or even Metharme-7b? I haven't had the chance to really test it myself yet.

I tested it with a storywriting experiment (NSFW), where I gave it a 258 token prompt; A story title, 5 lines on character and situation description and a couple of lines of the start of the story.
And it behaved much mre coherent than regular pygamilion 7b and more "colorful" than the free Vicuna. I only got to run a couple of scenarios that I have tried with other uncensored models and pygmalion, but I felt there were a markedly difference in quality!
If you get a 4bit version done that I can run exclusively on GPU, I'll be happy to test Cards and RP as well!

By the way, any feedback into how this model feels compared to base Pygmalion-7b, or even Metharme-7b? I haven't had the chance to really test it myself yet.

The mix is very successful. I would like to try ggml version. With quantization 5.1 or higher. Is this possible?

Sign up or log in to comment