Instructions to use froggeric/Qwen-Fixed-Chat-Templates with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use froggeric/Qwen-Fixed-Chat-Templates with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen-Fixed-Chat-Templates froggeric/Qwen-Fixed-Chat-Templates
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Abruptly stopping with <tool call>
I'm running Qwen3.6 27B with llama.cpp, pi mono and the v8 chat template. It works really well most of the time, but occasionally it just stops with it apparently trying and failing to do a tool call. I'm honestly not sure what could be the cause. Maybe it's just a quirk of the model. Has anyone else been seeing this happen?
See discussion #1 as linked - this is definitely a quirk of Qwen3.6 so far but the best speculation the people using these modified templates have come up with is:
- Model Quantization can impact the likelihood of this happening
- The template may tend to fail more in chained tool calls, i.e.
<tool_call> edit<x></tool_call>\nLet me also edit .\n...` - Depending on whether you're worrying about cache invalidation (
{%- set preserve_thinking = true %}at the top of your template), the previous reasoning done by the model can teach the model the bad pattern of a "thinking" tool call. - From AI review, the template teaches the model to always start its turn with
<think>, even if the model may have otherwise not been thinking.
These are our leading hypotheses, and all of those don't even take into account the harness or runtime you might be using. Hence why some templates may work better or more stable in different contexts.
Please try with v15. I think I have finally managed to fix the overthinking/indecision, and the repeated loops on tool errors among other things. So far, in my testing it is holding up. I am hoping this will be the final version.