Looping

#2
by YannickGaspar - opened

Hi Unsloth team! Has there been any progress on addressing the looping/repetition issues with this model? It seems to be inherent to the model itself. A Discord community of RTX Pro 6000 owners has been extensively testing the lukealonso nvfp4 version, but we have limited success mitigating the looping on our end. Any insights, updates, or recommended parameters would be hugely appreciated!

It can't answer simple questions even on Q8 due to thought loops. is something wrong with llamacpp, or does vllm or sglang loops too?

It can't answer simple questions even on Q8 due to thought loops. is something wrong with llamacpp, or does vllm or sglang loops too?

no its just the model itself, not the inference engines. The RTX Pro 6000 discord has kinda worked around it with some parameter tuning on a custom sglang fork with b12x but its not perfect.

--repeat-penalty 1.2 ^
--reasoning-budget 4096 ^

I faced the looping in thinking block too.

Sign up or log in to comment