Fix: preserve_thinking default regressed to false in v20 (contradicts v19 changelog & README)

#46

Problem

The v19 changelog states preserve_thinking defaults to true ("mathematically guaranteeing 100% KV Cache prefix matching out-of-the-box", curing "amnesia stalls"). The v20 rewrite regressed this:

{%- set _preserve_thinking = preserve_thinking if preserve_thinking is defined else false %}

v19's actual logic was preserve-unless-explicitly-disabled (preserve_thinking is defined and preserve_thinking == false → strip). No inference engine passes this kwarg by default, so v20 users silently get false: past <think> blocks are stripped as last_query_index advances each turn, mutating already-rendered history and invalidating the KV cache prefix every single turn — exactly the problem described in Discussion #1.

Fix

One word: else falseelse true, restoring the documented v19 behavior. Users who want stripping can still pass preserve_thinking: false.


Each fix in this series is an independent PR based on current main (v20); they touch overlapping regions of the same file, so merging one may require the others to be rebased — happy to update them.

Hi @Moore2877 , thanks for this great series of fixes! I just merged PR #42. Could you please rebase your PRs (starting with this one, #46, then #45, #49, #48, #47, and #50) so we can merge them sequentially? Thanks again for your excellent work on these!

Hey @Moore2877 ! Nevermind about rebasing—I went ahead and manually resolved the merge conflicts and integrated your excellent fixes directly into the new v21 release on main. Thank you so much for this incredible series of PRs! Closing this as the code is now officially merged.

froggeric changed pull request status to closed

@froggeric Happy to help. Thank you for looking at the PRs. You have a cool project here.

Sign up or log in to comment