buun-Qwen3.6-chat_template

A hardened Jinja chat template for Qwen3.5 and Qwen3.6 models on llama.cpp, Open WebUI, vLLM, and any OAI-compatible endpoint.

Built on top of the official Qwen3.6-27B chat template, incorporating fixes from froggeric/Qwen-Fixed-Chat-Templates (v19) and our own production fixes from the buun-llama-cpp fork.

Usage

llama.cpp / llama-server

llama-server -m model.gguf --chat-template-file chat_template.jinja

With config:

llama-server -m model.gguf --chat-template-file chat_template.jinja \
  --chat-template-kwargs '{"enable_thinking": true, "preserve_thinking": false}'

Open WebUI / vLLM

Set the chat template file or paste the contents into your template configuration.

Configuration

Pass via --chat-template-kwargs (llama.cpp) or your framework's equivalent:

Variable	Type	Default	Description
`enable_thinking`	bool	`true`	Controls `<think>` mode. Set `false` for speed-critical sessions.
`auto_disable_thinking_with_tools`	bool	`false`	Auto-disables thinking when tools are provided. Prevents `<tool_call>` leaking into `<think>` blocks.
`preserve_thinking`	bool	`false`	Keeps reasoning in all prior assistant turns. Set `true` for stateless API servers that benefit from KV cache prefix matching. Set `false` (default) for persistent sessions — strips reasoning from earlier turns for smaller context.
`add_vision_id`	bool	`false`	Prefix images/videos with "Picture N:" / "Video N:".
`max_tool_arg_chars`	int	`0`	Truncate tool argument values exceeding this length (0 = unlimited).
`max_tool_response_chars`	int	`0`	Truncate tool response content exceeding this length (0 = unlimited).

Fixes

25 fixes over the official template. The table below shows which issues each template addresses:

#	Issue	Official	Froggeric v19	This template
1	`add_vision_id` / `enable_thinking` crash when undefined	Bug	Fixed	Fixed
2	`namespace()` needs precomputed values in some engines	Bug	—	Fixed
3	`developer` role not handled (Claude Code, Codex, OpenCode)	Bug	Fixed	Fixed
4	System/developer extraction before main loop	—	Similar	Fixed
5	`item.type` checked before `'in item'` key test (safer)	Bug	—	Fixed
6	`arguments.items()` replaces bare `\|items` filter	Bug	Fixed	Fixed
7	`\| safe` filter removed (llama.cpp minja compat)	Bug	Fixed	Fixed
8	`tojson`/`string` explicit if/else (no chained filters)	Bug	Fixed	Fixed
9	String arguments pass-through for OAI-compat proxies	Bug	Fixed	Fixed
10	`tc` alias avoids shadowing `tool_call` loop variable	Bug	Fixed	Fixed
11	`ns2` namespace replaces `loop.previtem`/`loop.nextitem`	Bug	Fixed	Fixed
12	`enable_thinking` applied to in-context assistant turns	—	—	Fixed
13	`reasoning_content is defined and not none` guard	Bug	Fixed	Fixed
14	`loop.index0 >` (not `>=`) for assistant thinking scope	Bug	—	Fixed
15	Parallel tool calls: `\n\n` delimiter between blocks	Missing	—	Fixed
16	Tool arg/response truncation (`max_tool_arg_chars`, etc.)	Missing	—	Fixed
17	Deep agent loops: fallback to index 0 instead of crash	Crash	—	Fixed
18	Streaming compat: clean newline boundaries on XML tags	—	—	Fixed
19	Auto-disable thinking with tools (configurable)	—	—	Fixed
20	Unknown roles: graceful fallback to user role	Crash	Crash	Fixed
21	Flattened nesting depth for llama.cpp minja stability	—	—	Fixed
22	Empty-think poisoning: no interior whitespace	Bug	Fixed	Fixed
23	`preserve_thinking`: opt-in KV cache prefix preservation	Partial	Fixed	Fixed
24	Retry stall escalation for agentic tool loops	—	Fixed	Fixed
25	Fuzzy `</think>` parsing (`</thinking>`, `</ think>`, etc.)	—	Fixed	Fixed

Key fixes explained

Mid-conversation system messages — The official template crashes with raise_exception('System message must be at the beginning.'). This template renders them normally, supporting system reminders injected mid-conversation by agent frameworks.

Empty-think poisoning (fix 22) — The official template's disabled-thinking generation prompt uses <think>\n\n</think> (with interior whitespace), which causes the model to see a "non-empty" think block and attempt to continue it, leading to ~80% premature turn aborts. Fixed by removing the interior whitespace.

Retry stall escalation (fix 24) — Detects consecutive tool call failures via a heuristic (short responses containing error keywords) and injects escalating warnings. After 2+ failures, tells the model to use a fundamentally different approach. Breaks infinite retry loops in agentic tool-use sessions.

Preserve thinking (fix 23) — Default false strips reasoning from earlier turns, saving context at 200K+ windows. Set true for stateless API servers where KV cache prefix matching matters.

Deep agent loops (fix 17) — When all user messages are <tool_response> wrappers (common in deep agentic loops), the official template crashes with "No user query found." This template falls back gracefully.

Credits

Qwen team — official template
froggeric — empty-think fix, fuzzy think parsing, retry escalation, preserve_thinking
barubary — minja compat fixes, tool truncation, agent loop handling, streaming compat, unknown role fallback, mid-conversation system messages

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support