Lewdiculous/FuseChat-Kunoichi-10.7B-GGUF-IQ-Imatrix

Owner Mar 6

    quantization_options = [
        "Q4_K_M", "Q4_K_S", "IQ4_XS", "Q5_K_M", 
        "Q5_K_S", "Q6_K", "Q8_0", "IQ3_M", "IQ3_S", "IQ3_XS", "IQ3_XXS"
    ]

Lewdiculous pinned discussion Mar 6

Lewdiculous

Owner Mar 6

Thanks for the merge, @Virt-io , and keep experimenting!

I'm more of a 7B-9B person myself but I might start paying more attention to the 11B parameter models now.

Virt-io

Mar 6

@Lewdiculous
Thanks, if you have any interesting model please tell me about them.

Also if you can try it with the new instruct template that I just uploaded.
https://huggingface.co/Virt-io/FuseChat-7B-VaRM-GGUF/blob/main/presets/GPT4-Correct-instruct-RP.json

Lewdiculous

Owner Mar 6

•

edited Mar 6

Awesome, I love to see that more models are coming with recommended presets!

I should add anything interesting to my Personal Favorites collection.

I'm mostly interested in roleplay models.

Virt-io

Mar 6

@Lewdiculous
I have a question about the imatrix.txt and its effect for RP quantizations.
I took a look at it and there are novel examples, but they only use quotations for speech.
I didn't see any examples that use asterisks for formatting.

I read that the imatrix can overfit on the data used to compute them.

So my question is, Is the lack of rp chat style examples causing the quants to have issues with formatting?

Obviously lower quants are dumber, but I noticed this on the higher quants as well.
They tend to format inconsistently. One paragraph they use the asterisks the next they don't.

Lewdiculous

Owner Mar 6

•

edited Mar 6

Great conjecture.

I would be interested in a data that also had good examples with asterisks, as I opted for Kalomaze's general data which was achieving technically "better" results.

https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384

But I was also thinking that it's very relevant for our use case, considering a semi random sample but that also included clear examples of the expected formatting, but then the over-fitting concerns rise again, although I'm not sure how bad it would be for roleplaying, but it could potentially make the model quants behave clearly different compared to the original, even if "better" for an RP use case.

Lewdiculous

Owner Mar 6

About formatting inconsistencies, they can happen of course, and as you mentioned smaller quants will surely suffer more, but most of time I've experienced it wasn't so bad, as in 50 messages, 45 were formated as expected (asterisks, quotes) and only 5 missed the it. It is definitely annoying, so perhaps considering the possibility of an RP centric calibration data, even at the risk of over-fitting for our use case, it could be interesting.

Will tag:
@Test157t @jeiku

To ask:

About Kalomaze's groups_merged.txt versus maybe a more RP centric imatrix calibration data? Or perhaps we can just add our own data to the existing one that is kind of accepted as good to not overfit and the best PPL I believe?

Nitral-AI

Mar 6

@Lewdiculous Will spend some time looking into this as soon as i get the free time.

Lewdiculous

Owner Mar 7

Who knows if maybe we can use over-fitting in our favor, it would distance the quant from the original but if the results are positive for a certain use case like Roleplaying it's definitely in interesting thing. I did look for other good such pseudo random data but couldn't find them.

Interested to hear from everyone else and in case you have another candidate data to suggest.

Lewdiculous

Owner Mar 7

I saw @Virt-io using this earlier:

https://huggingface.co/datasets/jan-hq/ldjnr_capybara_binarized

It is pretty huge, so this would need to be sorted.

Lewdiculous

Owner Mar 7

Something I remember looking at were the datasets by ChaiML, this one for example:

https://huggingface.co/datasets/ChaiML/20240222_chai_prize_reward_model_data

They do have the structure we expect to see on an roleplay scenario, and can be adjusted of course to fix formatting inconsistencies.

It does seem like this would scream over-fitting, so maybe just select some good excerpts from the data to be tastefully added to Kalo's original groups_merged.txt?

Virt-io

Mar 7

Yes, I think adding them to the existing is probably the best way to do this.

Otherwise it may cause issues with cards that are formatted in json or similar.

Lewdiculous

Owner Mar 7

•

edited Mar 7

Makes sense, JSON and python like code patterns are often used for character descriptions, in my case I still use python lists for my characters.

As per the original discussion, in general coding excerpts seem to be important no matter what.

Lewdiculous

Owner Mar 7

•

edited Mar 7

After also using their chatting application and looking at this dataset it seems they don't do "speech in quotations", having both examples would be good as some people just also don't do that and some cards as well. As in 15~ish of each maybe?

Will wait for more opinions on this.

A formal testing and evaluations would need to happen to make sure the quant model isn't being butchered.

Q4~Q5s
GGUF
vs
GGUF-Imatrix (kalo's data)
vs
GGUF-Imatrix (modified data)

Nitral-AI

Mar 7

We're cooking a dpo right now.

Lewdiculous

Owner Mar 7

Then we have Ika saying overfitting is very unlikely with how imatrix is implemented in llama.cpp:

https://github.com/ggerganov/llama.cpp/discussions/5006#discussioncomment-8166807

It's kind of all over the place.

Virt-io

Mar 7

•

edited Mar 7

I think it's still worth trying.

A couple weeks back, I was messing around with adding 100 lines from this https://github.com/apple/ml-mkqa to groups_merged.txt
It did seem to have affected the quantization, as it would sometimes randomly insert Chinese characters into the responses.

It might be interesting to make a toxic dataset and quantize a safe llm to see how much if at all the data is affecting quantizations.

https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data

Virt-io

Mar 7

•

edited Mar 7

https://huggingface.co/Virt-io/FuseChat-7B-VaRM-GGUF/blob/main/tests/RP-test-v3.txt

I didn't clean any of the data just randomly sampled

Added 25 lines from https://github.com/apple/ml-mkqa (this should probably be removed)

Added 270 ish lines from https://huggingface.co/datasets/ChaiML/20240222_chai_prize_reward_model_data and a paragraph from bluemoonraw

In my opinion we probably don't need to add any RP examples that aren't formatted, as that is the model's default.

Edit:

https://huggingface.co/Virt-io/FuseChat-7B-VaRM-GGUF/blob/main/tests/RP-test-v4.txt

I've removed the 25 wiki lines, it was giving me random chinese characters at the end of responses.

I have been testing it with IQ4_XS and the formatting issue is pretty much gone, at least in my limited testing.

@Lewdiculous @Test157t

Lewdiculous

Owner Mar 7

•

edited Mar 7

Interesting, I'll need time to test with my models of choice later.

Kind of busy lately.

Love to see them:

Anonymous user: I want to tell you something but...*nervous and shy*

Anonymous user: what...no way

Anonymous user: ""I give her good food and dresses""

Lol these guys, at least format consistently! I can't say I'm surprised considering where these probably came from.

Lewdiculous
/

FuseChat-Kunoichi-10.7B-GGUF-IQ-Imatrix

General discussion.