buun-Qwen3.6-chat_template

A hardened Jinja chat template for Qwen3.5 and Qwen3.6 models on llama.cpp, Open WebUI, vLLM, and any OAI-compatible endpoint.

Built on top of the official Qwen3.6-27B chat template, incorporating fixes from froggeric/Qwen-Fixed-Chat-Templates (v19) and our own production fixes from the buun-llama-cpp fork.

Usage

llama.cpp / llama-server

llama-server -m model.gguf --chat-template-file chat_template.jinja

With config:

llama-server -m model.gguf --chat-template-file chat_template.jinja \
  --chat-template-kwargs '{"enable_thinking": true, "preserve_thinking": false}'

Open WebUI / vLLM

Set the chat template file or paste the contents into your template configuration.

Configuration

Pass via --chat-template-kwargs (llama.cpp) or your framework's equivalent:

Variable Type Default Description
enable_thinking bool true Controls <think> mode. Set false for speed-critical sessions.
auto_disable_thinking_with_tools bool false Auto-disables thinking when tools are provided. Prevents <tool_call> leaking into <think> blocks.
preserve_thinking bool false Keeps reasoning in all prior assistant turns. Set true for stateless API servers that benefit from KV cache prefix matching. Set false (default) for persistent sessions — strips reasoning from earlier turns for smaller context.
add_vision_id bool false Prefix images/videos with "Picture N:" / "Video N:".
max_tool_arg_chars int 0 Truncate tool argument values exceeding this length (0 = unlimited).
max_tool_response_chars int 0 Truncate tool response content exceeding this length (0 = unlimited).

Fixes

25 fixes over the official template. The table below shows which issues each template addresses:

# Issue Official Froggeric v19 This template
1 add_vision_id / enable_thinking crash when undefined Bug Fixed Fixed
2 namespace() needs precomputed values in some engines Bug Fixed
3 developer role not handled (Claude Code, Codex, OpenCode) Bug Fixed Fixed
4 System/developer extraction before main loop Similar Fixed
5 item.type checked before 'in item' key test (safer) Bug Fixed
6 arguments.items() replaces bare |items filter Bug Fixed Fixed
7 | safe filter removed (llama.cpp minja compat) Bug Fixed Fixed
8 tojson/string explicit if/else (no chained filters) Bug Fixed Fixed
9 String arguments pass-through for OAI-compat proxies Bug Fixed Fixed
10 tc alias avoids shadowing tool_call loop variable Bug Fixed Fixed
11 ns2 namespace replaces loop.previtem/loop.nextitem Bug Fixed Fixed
12 enable_thinking applied to in-context assistant turns Fixed
13 reasoning_content is defined and not none guard Bug Fixed Fixed
14 loop.index0 > (not >=) for assistant thinking scope Bug Fixed
15 Parallel tool calls: \n\n delimiter between blocks Missing Fixed
16 Tool arg/response truncation (max_tool_arg_chars, etc.) Missing Fixed
17 Deep agent loops: fallback to index 0 instead of crash Crash Fixed
18 Streaming compat: clean newline boundaries on XML tags Fixed
19 Auto-disable thinking with tools (configurable) Fixed
20 Unknown roles: graceful fallback to user role Crash Crash Fixed
21 Flattened nesting depth for llama.cpp minja stability Fixed
22 Empty-think poisoning: no interior whitespace Bug Fixed Fixed
23 preserve_thinking: opt-in KV cache prefix preservation Partial Fixed Fixed
24 Retry stall escalation for agentic tool loops Fixed Fixed
25 Fuzzy </think> parsing (</thinking>, </ think>, etc.) Fixed Fixed

Key fixes explained

Mid-conversation system messages — The official template crashes with raise_exception('System message must be at the beginning.'). This template renders them normally, supporting system reminders injected mid-conversation by agent frameworks.

Empty-think poisoning (fix 22) — The official template's disabled-thinking generation prompt uses <think>\n\n</think> (with interior whitespace), which causes the model to see a "non-empty" think block and attempt to continue it, leading to ~80% premature turn aborts. Fixed by removing the interior whitespace.

Retry stall escalation (fix 24) — Detects consecutive tool call failures via a heuristic (short responses containing error keywords) and injects escalating warnings. After 2+ failures, tells the model to use a fundamentally different approach. Breaks infinite retry loops in agentic tool-use sessions.

Preserve thinking (fix 23) — Default false strips reasoning from earlier turns, saving context at 200K+ windows. Set true for stateless API servers where KV cache prefix matching matters.

Deep agent loops (fix 17) — When all user messages are <tool_response> wrappers (common in deep agentic loops), the official template crashes with "No user query found." This template falls back gracefully.

Credits

  • Qwen team — official template
  • froggeric — empty-think fix, fuzzy think parsing, retry escalation, preserve_thinking
  • barubary — minja compat fixes, tool truncation, agent loop handling, streaming compat, unknown role fallback, mid-conversation system messages
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support