I once again, ask for your support (if possible)

#2
by InfernalDread - opened

Hello,

Thank you for the release of these post trained models! I humbly ask if @paragon-of-brah is willing to create GGUFs for this model in order to test its claims. In my personal case, IQ2_M would be perfect as this model seems to do fantastic under quantization.

Thank you.

All right, downloading rn

All right, downloading rn

Thank you very much! Take all the time you need, no rush at all!

@paragon-of-brah and some type of Q4 (q4-k-l would be great) pretty please too. Thank you!

looks like its better then the nex pro ?

looks like its better then the nex pro ?

that's what we will be testing soon. I am very excited to see.

looks like its better then the nex pro ?

that's what we will be testing soon. I am very excited to see.

its looks like it is better then the nex mini i just tested with claude code and its started to fixing the error that nex mini didt fixed it . one think is clear that a 35B with 3B active will give really difficult situation to the sonnet.

edit used q8

looks like its better then the nex pro ?

that's what we will be testing soon. I am very excited to see.

its looks like it is better then the nex mini i just tested with claude code and its started to fixing the error that nex mini didt fixed it . one think is clear that a 35B with 3B active will give really difficult situation to the sonnet.

edit used q8

fantastic! hopefully they release the 31B dense version as well!

Testing it now. Will be adding Svgs below: just of the bat it seems okay hasn't looped yet. On the hard tests its not on par with Kimi or GLM but here is what i got for the SVG tests:
Q8 MLX

image

image

image

Testing it now. Will be adding Svgs below: just of the bat it seems okay hasn't looped yet. On the hard tests its not on par with Kimi or GLM but here is what i got for the SVG tests:
Q8 MLX

image

image

image

could you share the hard tests where it underperforms?

Testing it now. Will be adding Svgs below: just of the bat it seems okay hasn't looped yet. On the hard tests its not on par with Kimi or GLM but here is what i got for the SVG tests:
Q8 MLX

image

image

image

try with bf16 somethinking is not good with q8

All right, so the MTP graft strategy just doesn't really work for ik_llama. While the MTP works, it's trained to predict the output of base Qwen 3.5 and results in low acceptance rate and low TG when used with other models such as Ornith, at least on my setup.

So now I'm looking into DFLASH instead, a novel diffusion based MTP-like that might give better TG. Ofc, this means that things are going to be slightly delayed. I'll keep you updated.

All right, so the MTP graft strategy just doesn't really work for ik_llama. While the MTP works, it's trained to predict the output of base Qwen 3.5 and results in low acceptance rate and low TG when used with other models such as Ornith, at least on my setup.

So now I'm looking into DFLASH instead, a novel diffusion based MTP-like that might give better TG. Ofc, this means that things are going to be slightly delayed. I'll keep you updated.

Not a problem, thank you for taking the time to do this!

Sign up or log in to comment