Instructions to use froggeric/Qwen-Fixed-Chat-Templates with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use froggeric/Qwen-Fixed-Chat-Templates with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen-Fixed-Chat-Templates froggeric/Qwen-Fixed-Chat-Templates
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Fix: preserve_thinking default regressed to false in v20 (contradicts v19 changelog & README)
Problem
The v19 changelog states preserve_thinking defaults to true ("mathematically guaranteeing 100% KV Cache prefix matching out-of-the-box", curing "amnesia stalls"). The v20 rewrite regressed this:
{%- set _preserve_thinking = preserve_thinking if preserve_thinking is defined else false %}
v19's actual logic was preserve-unless-explicitly-disabled (preserve_thinking is defined and preserve_thinking == false → strip). No inference engine passes this kwarg by default, so v20 users silently get false: past <think> blocks are stripped as last_query_index advances each turn, mutating already-rendered history and invalidating the KV cache prefix every single turn — exactly the problem described in Discussion #1.
Fix
One word: else false → else true, restoring the documented v19 behavior. Users who want stripping can still pass preserve_thinking: false.
Each fix in this series is an independent PR based on current main (v20); they touch overlapping regions of the same file, so merging one may require the others to be rebased — happy to update them.
Hey @Moore2877 ! Nevermind about rebasing—I went ahead and manually resolved the merge conflicts and integrated your excellent fixes directly into the new v21 release on main. Thank you so much for this incredible series of PRs! Closing this as the code is now officially merged.