Thanks for the quant

#1
by FlareRebellion - opened

You seem to be the only person doing importance matrix quants and this is a cool model. It works great for me, thanks.

PS: This is NOT a request and too early to tell but cloudyu/Yi-34Bx2-MoE-60B-DPO could be great, considering the strength of the non DPO variant. I'll keep my eyes open for more of your quants in any case.

Thanks!  I'll have a look at it :-)

It's no longer accessible.

yeah lol. Probably model was borked or something, wouldn't be the first time with DPO training going haywire

If you have other quant suggestions (70B or fewer) feel free to send :-)

Oh yeah, sure, I'm full of suggestions. I guess iquants of these would be pretty cool.

https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-GGUF
https://huggingface.co/jondurbin/bagel-dpo-8x7b-v0.2
https://huggingface.co/jondurbin/bagel-dpo-34b-v0.2

they all have standard gguf quants by TheBloke already, but I guess modern, state of the art, importance matrix quantisations could improve things for the gpu poor.

@Artefact2 just in case you're still up for more quant suggestions :)

https://huggingface.co/codellama/CodeLlama-70b-Instruct-hf
https://huggingface.co/ycros/BagelMIsteryTour-v2-8x7B

Thanks for all your hard work.

Edit: Accidentally linked the BagelMisteryTour v1, oops.

I'm putting myself in timeout for not even pasting the right link when asking for your quants, for shame.

Another new model that might be interesting (no gguf quantisations yet)

https://huggingface.co/serpdotai/sparsetral-16x7B-v2

though, maybe this weird sparse type model needs different quantisation methods and won't work with the old ones?

Sparsetral/Camelidae are new model architectures, it will take a while for llama.cpp to support it (if ever). You can open a suggestion upstream if you want!

Sign up or log in to comment