Kunoichi with long legs!

#1
by Spacellary - opened

Keep up the good work, I feel like it does longer contexts better than the original, not sure about what I'm sacrificing for it yet, but it's a positive first impression.

Looking forward especially if you're cooking with Kunoichi again.

Honestly i need to graph out perplexity between the original and extended. But thank you for the feedback! it was bit weird to get these models to merge at all, however i think it can be improved further.

@Test157t Absolutely, that'd be nice.

@Test157t - Honestly there's something about this model that just jives with my character cards so well. It's a special good boy. Couple other people I recommended it to also were positive on it. It just feels "consistent".

At this rate i might have to do a v2 for this one.

@Test157t -- Booyah! Not sure it's worth testing so late but I'm uploading the GGUF-Imatrix quants from the F16 for this model here, including the new IQ3_S that should be compatible with the next version of KCPP.

@Test157t -- Already uploading v2 quants from the new config to the same repo, if no major issues are found than this model should be good on Quants, the new ones are tagged with v2 in the file names for easy distinction. If needed I'll do a v3 or whatever as necessary. Love me some longer context :')

@Lewdiculous Thank you! the long text version (1.2) is still "regarded" so im going have to go back to the drawing board with that one. Been testing lelantacles v5 for comparison and it kicks ass at 16k still.

Been testing lelantacles v5 for comparison and it kicks ass at 16k still

In a way it was progress :)

Will hold off until 1.2 or more realistically the next version entirely is cleared for testing then. It's all about experimenting, cheers!

Fun times, I look outside and there's already IQ4_NL and now a new IQ4_XS merged 11 hours ago. GGUF moving fast xD

Might redo 1.2 without normalizing and in slerp instead of dare-ties. but i will test it locally along with the quants before i clear it for use.

Oh it would appear that ive missed that, im falling behind it would seem.

@Test157t, yeah, things are moving quite fast in GGUF land now.

IQ4_
NL: https://github.com/ggerganov/llama.cpp/pull/5590
XS: https://github.com/ggerganov/llama.cpp/pull/5747

Honestly with the very small loss in quality it's much more interesting for me to just push for higher context, I'm staying around the same VRAM usage, but with 4K more context.

Need to see how it goes, maybe you can do real Bbenchmarks.

I've uploaded these 2 news quant-types for Kunocchini-7b-128k-test-GGUF-Imatrix, they are in the v2 list.

Sign up or log in to comment