Fix thinking bug in jinja template

#7
by huaj1ng - opened

Without the \n after <think>, the think content will be mixed into normal conversational text.

This is the practice aligning with jinja template usages in comparable projects, for example:

Should be a solution to https://github.com/nex-agi/Nex-N2/issues/2

Nex AGI org

Hi @huaj1ng , thanks a lot for the contribution and for digging into the chat template! πŸ™

To help us verify the fix, could you share a bit more detail?

A repro β€” the original request (rendered prompt / messages) and the raw response where the thinking content blended into the regular text.
Serving stack β€” were you using our recommended sglang branch or upstream sglang / another engine, and was --reasoning-parser qwen3 enabled? The rendering of can differ across stacks.
This will help us confirm the change matches the training-time format before merging. Thanks again!

Nex AGI org

Hi @huaj1ng β€” thanks for the report.

After investigation, the root cause turned out to be in llama.cpp's reasoning parser, not the template.

Adding \n after does work around it, but the model was trained strictly on the current template, so deviating from it at inference time may hurt output quality. We'd rather keep the template as-is.

We've patched llama.cpp and verified the fix with the unmodified GGUF. Builds are available now:

Binaries: https://github.com/nex-agi/llama.cpp/releases/tag/nex-b9596-fix-b9599-9cd1771
Docker: docker pull ghcr.io/nex-agi/llama.cpp:server-cuda-nex-b9596-fix-b9598-8c0d5c9 (more variants at https://github.com/orgs/nex-agi/packages)

We'll submit the patch upstream to llama.cpp shortly β€” once merged, stock llama.cpp will work out of the box. We'll update this thread with the PR link.

00index changed pull request status to closed

Sign up or log in to comment