GGUF
English
Not-For-All-Audiences

General feedback/Experimental testing.

#1
by Lewdiculous - opened

Will do quants with the usual imatrix data, and another version with another experimental data that could have lower PPL (will upload in a folder called "experimental" in this repo), may I request that anyone interested please test both versions against each other for feedback and maybe further testing?

For anyone that can test I'd be happy to hear about it and thankful for your time, looking for usability experience feedback in mostly a conversational and roleplay scenario.

If you notice anything even small changes I'd be glad to hear.

Lewdiculous pinned discussion

@jeiku First batch of Quants are ready, will now do the extra experimental ones, with @Virt-io 's imatrix-with-rp-multi-kaomoji-v1.txt data.

@jeiku First batch of Quants are ready, will now do the extra experimental ones, with @Virt-io 's imatrix-with-rp-multi-kaomoji-v1.txt data.

I've got the rpdata quant loaded up for testing. Thanks for all you do!

@jeiku I don't expect the currently uploaded quants to be much different (should be about as good as usual). Now, the next ones will use the more varied data suggested by Virt. Better PPL might be just because of more data, or who knows. Will take a while, but later on they should be there.

Just realized some of the kaomoji's are tabbed out all the way to the right :|

Also if anyone has any good datasets that have the same phrases in multiple languages please share.

Another thing I want to test is removing all the extra language stuff, and replacing it with just their alphabets.

@Virt-io This is a rather long data as it is, ideally after optimizing the actual data contents, trying to condense it down to about 50k tokens~ 335KB would be ideal.

Of course retaining the same quality or with % of error - "close enough".

That's true, one of the reasons I want to replace the multi language wiki.

Kaomoji's are to blame, they take up quite a bit of lines, however I don't believe they add that much compute time as they are quite short.

Compute time for me almost tripled/3.5x from the usual 30k tokens from Kalomaze's general data, to this kaomoji-v1 data. It's quite a long time.

Are you computing it with FP16?

I guess I didn't notice it much as I use Q8 when making mine.

Will try removing useless things to shorten compute time.

Are you computing it with FP16?

Ah, yes, I do it with FP16. I just have a preference to use that instead of a further quantized Q8_0.

Hopefully importance matrix additions won't be much longer than usual.

Well @Test157t and I have found that the quotation marks roleplay issues are just as bad as ever. I never catch it in early tests because I only RP in plaintext, but this model definitely has that problem.

@Test157t @jeiku

https://huggingface.co/Lewdiculous/InfinityRP-v1-7B-GGUF-IQ-Imatrix

Somehow this guy masterfully manages to stay in track in the asterisks+quotations format, and can adapt to not doing so, but it prefers the quotations, interesting base data!

Only it struggles at anything above 8K CTX xD

Let's see if Paradigm Shift quants behave any differently with the larger data.

We are currently working independently to create a solution to the RP problem. We each have ideas on the best way to fix it and will likely merge our best efforts after some testing.

@jeiku -- Experimentals up. Lot more data for imatrix.

I haven't been able to find a difference between the two, but I also haven't run extensive benchmarks.

Yeah, I'm not seeing much difference in use. I could ppl test them, but I'm sure you've already done that. I would say use the one that can generate the imatrix.dat faster.

Well. I'm not an expert and this is my inexpert take that can 100% be wrong and harebrained, but in terms of Imatrix afaIct:

What you're trying to do with imatrix is find a quant that prioritizes activations involving the format you want, correct?

Ie: "dialogue" action

https://github.com/ggerganov/llama.cpp/discussions/5263

based on this it suggests the compiled imatrix dat's text input length doesn't matter as much as the specific content of that text.

Maybe by using this big txt full of a lot of general use cases instead of a small one of specific uses, you're also smoothing out the particular 'benefit' of imatrix that you're trying to bring out?

the one that can generate the imatrix.dat faster

previous one then but we'll see, actually haven't tested for PPL, so if you could that'd be good real quick @jeiku

@Tibbnak

based on this it suggests the compiled imatrix dat's text input length doesn't matter as much as the specific content of that text.

length was just a consequence of trying to increase diversity as there were some other assumptions about the more different from training data the imatrix data the better the ppl but it's all in flux really

there's also a comment from Ika (responsible for implementation) in the math behind that suggests over-fitting isnt as prominent as we think and data would only "resonate" in bringing out what's already part of model weights* or something like that but I'm also probably messing this interpretation - which is of course contradicted by Kalomaze

Okay, bear in mind this is only one test on one model with one dataset:

experimental | Final estimate: PPL = 39.5802 +/- 2.53358
rpdata | Final estimate: PPL = 39.2282 +/- 2.51130

Well within margin of error as you can see.

Aye, welp, it's getting late so we see to it tomorrow, it is I as expected then, even if narrow in testing considering this Kalo's general data is already at the same % of error.

Thanks again.

Keep an eye out tomorrow, I'll be uploading a proper RP model to ChaoticNeutrals. No rush, but this is looking really promising for sticking to dialogue/markdown combo. It also passes my informal riddle test, great at sticking to system prompt, and best of all it is compliant and willing to respond to the user as expected.

Sign up or log in to comment