Smaller quant for 16GB?

#1
by zappa2005 - opened

Would it be possible to do a smaller quant? I'd really like to try :-)

Yeah. Is there a ideal size for 16G? 2.7?

I think I may have discussed this with you before, can't remember lol

Haha, yes I guess we did - here https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-megamerge-v8-31bpw-exl2-fiction/discussions/1

There you provided 2.6 and 2.4, which was perfect, but back then both railed off right away, and you wanted to look into this and recommended the IQ2_XXS from TheBloke :-)

Not sure if you ever found out why that happened, but maybe that was a different method (without the imatrix?)

I remember now! Should be availible here:

https://huggingface.co/brucethemoose/Yi-34B-200K-RPMerge-exl2-2.67bpw

There you provided 2.6 and 2.4, which was perfect, but back then both railed off right away.

I actually did, sort of! So I quantized those models at 32K context. It turns out their low context perplexity was horrible, but once the context gets bigger (10K+) the model performance picked back up. Thats why testing and initial perplexity testing was so bad. Some discussion/numbers here, though there was more testing as well: https://huggingface.co/DrNicefellow/ChatAllInOne-Yi-34B-200K-V1/discussions/1#65be7f2db7db0ab0959cb859

Anyway I quantized the above file with vanilla exllama settings, so it shouldn't be disasterious at low context, I will make a 2.4 as well.

Maybe longer context version as well? I will see.

Sign up or log in to comment