worlds most quantization sensitive model, congrats🥳🤩

#8
by unokayish182 - opened

diffusion+moe+VLM this combination is just crazy, this is a joke srry google

I've managed to quantize it to int4 SVDQ for it's diffusion layers and W4A8KV4 for the LLM based encoder layers with Qserve/Omniserve.

I'm just unhappy with the speed, which I'm troubleshooting. But it outputs fine... 🤷

Maybe standard quantization you're used to isn't working as it's a diffusion based model not just typical LLM?

this is a joke, the joke is meant for standard quantization

Unsloth has done there job and have release 4Bit GGUF. (https://unsloth.ai/docs/models/diffusiongemma)

Sign up or log in to comment