buun-Qwen3.6-chat_template
A hardened Jinja chat template for Qwen3.5 and Qwen3.6 models on llama.cpp, Open WebUI, vLLM, and any OAI-compatible endpoint.
Built on top of the official Qwen3.6-27B chat template, incorporating fixes from froggeric/Qwen-Fixed-Chat-Templates (v19) and our own production fixes from the buun-llama-cpp fork.
Usage
llama.cpp / llama-server
llama-server -m model.gguf --chat-template-file chat_template.jinja
With config:
llama-server -m model.gguf --chat-template-file chat_template.jinja \
--chat-template-kwargs '{"enable_thinking": true, "preserve_thinking": false}'
Open WebUI / vLLM
Set the chat template file or paste the contents into your template configuration.
Configuration
Pass via --chat-template-kwargs (llama.cpp) or your framework's equivalent:
| Variable | Type | Default | Description |
|---|---|---|---|
enable_thinking |
bool | true |
Controls <think> mode. Set false for speed-critical sessions. |
auto_disable_thinking_with_tools |
bool | false |
Auto-disables thinking when tools are provided. Prevents <tool_call> leaking into <think> blocks. |
preserve_thinking |
bool | false |
Keeps reasoning in all prior assistant turns. Set true for stateless API servers that benefit from KV cache prefix matching. Set false (default) for persistent sessions — strips reasoning from earlier turns for smaller context. |
add_vision_id |
bool | false |
Prefix images/videos with "Picture N:" / "Video N:". |
max_tool_arg_chars |
int | 0 |
Truncate tool argument values exceeding this length (0 = unlimited). |
max_tool_response_chars |
int | 0 |
Truncate tool response content exceeding this length (0 = unlimited). |
Fixes
25 fixes over the official template. The table below shows which issues each template addresses:
| # | Issue | Official | Froggeric v19 | This template |
|---|---|---|---|---|
| 1 | add_vision_id / enable_thinking crash when undefined |
Bug | Fixed | Fixed |
| 2 | namespace() needs precomputed values in some engines |
Bug | — | Fixed |
| 3 | developer role not handled (Claude Code, Codex, OpenCode) |
Bug | Fixed | Fixed |
| 4 | System/developer extraction before main loop | — | Similar | Fixed |
| 5 | item.type checked before 'in item' key test (safer) |
Bug | — | Fixed |
| 6 | arguments.items() replaces bare |items filter |
Bug | Fixed | Fixed |
| 7 | | safe filter removed (llama.cpp minja compat) |
Bug | Fixed | Fixed |
| 8 | tojson/string explicit if/else (no chained filters) |
Bug | Fixed | Fixed |
| 9 | String arguments pass-through for OAI-compat proxies | Bug | Fixed | Fixed |
| 10 | tc alias avoids shadowing tool_call loop variable |
Bug | Fixed | Fixed |
| 11 | ns2 namespace replaces loop.previtem/loop.nextitem |
Bug | Fixed | Fixed |
| 12 | enable_thinking applied to in-context assistant turns |
— | — | Fixed |
| 13 | reasoning_content is defined and not none guard |
Bug | Fixed | Fixed |
| 14 | loop.index0 > (not >=) for assistant thinking scope |
Bug | — | Fixed |
| 15 | Parallel tool calls: \n\n delimiter between blocks |
Missing | — | Fixed |
| 16 | Tool arg/response truncation (max_tool_arg_chars, etc.) |
Missing | — | Fixed |
| 17 | Deep agent loops: fallback to index 0 instead of crash | Crash | — | Fixed |
| 18 | Streaming compat: clean newline boundaries on XML tags | — | — | Fixed |
| 19 | Auto-disable thinking with tools (configurable) | — | — | Fixed |
| 20 | Unknown roles: graceful fallback to user role | Crash | Crash | Fixed |
| 21 | Flattened nesting depth for llama.cpp minja stability | — | — | Fixed |
| 22 | Empty-think poisoning: no interior whitespace | Bug | Fixed | Fixed |
| 23 | preserve_thinking: opt-in KV cache prefix preservation |
Partial | Fixed | Fixed |
| 24 | Retry stall escalation for agentic tool loops | — | Fixed | Fixed |
| 25 | Fuzzy </think> parsing (</thinking>, </ think>, etc.) |
— | Fixed | Fixed |
Key fixes explained
Mid-conversation system messages — The official template crashes with raise_exception('System message must be at the beginning.'). This template renders them normally, supporting system reminders injected mid-conversation by agent frameworks.
Empty-think poisoning (fix 22) — The official template's disabled-thinking generation prompt uses <think>\n\n</think> (with interior whitespace), which causes the model to see a "non-empty" think block and attempt to continue it, leading to ~80% premature turn aborts. Fixed by removing the interior whitespace.
Retry stall escalation (fix 24) — Detects consecutive tool call failures via a heuristic (short responses containing error keywords) and injects escalating warnings. After 2+ failures, tells the model to use a fundamentally different approach. Breaks infinite retry loops in agentic tool-use sessions.
Preserve thinking (fix 23) — Default false strips reasoning from earlier turns, saving context at 200K+ windows. Set true for stateless API servers where KV cache prefix matching matters.
Deep agent loops (fix 17) — When all user messages are <tool_response> wrappers (common in deep agentic loops), the official template crashes with "No user query found." This template falls back gracefully.