UD-Q4_K_XL lower precision than Q4_K_M?

#12
by zhuangzhoumengdie - opened

I also noticed that certain weights in Q4_K_M are at Q5 while they remain at Q4 in UD-Q4_K_XL?

q4_k_m:
blk.0.ffn_gate_shexp.weight [2 048, 512] Q5_K

ud-q4_k_xl:
blk.0.ffn_gate_shexp.weight [2 048, 512] Q4_K

The shared expert weights are also the same situation for your Qwen3 Next Instruct quants (actually Q4_K_M is larger than UD-Q4_K_XL in that repo). Is this intentional?

Sign up or log in to comment