Fix: tool calls silently lost on llama.cpp/ik_llama — switch XML format to Hermes JSON

#45

Problem

The v16+ XML tool-call format (<function=name><parameter=...> inside <tool_call>) is not parseable by llama.cpp's (and ik_llama's) OpenAI-compatible server. The server's tool-call grammar triggers on the <tool_call> opener but its parser expects a JSON object (Hermes format) inside. When the model follows this template's instructions:

  1. the <tool_call> opener is consumed by the grammar trigger,
  2. the inner XML body fails to parse,
  3. the entire call is dumped into content with zero tool_calls in the response.

Agentic frontends (OpenCode, etc.) then see no tool call at all — the turn just ends. The failure is intermittent and misleading: tool calls only succeed when the model ignores the template's instructions and falls back to its trained native JSON format. Instruction-faithful models/finetunes break ~100% of the time. Short curl tests often pass (model uses native format at low context), masking the bug.

Diagnosed by capturing an agentic frontend's exact failing request via a logging proxy and replaying/bisecting it offline against ik_llama. The raw failing output looked like <function=skill>...</function></tool_call> appearing as plain text (opener eaten by the grammar trigger).

Fix

  • Instruction block now teaches the native Hermes JSON format: <tool_call> {"name": ..., "arguments": {...}} </tool_call> — which is both what Qwen is trained on and what llama.cpp/ik_llama/vLLM parsers actually parse.
  • Assistant-history rendering of past tool_calls emits the same JSON form (in-context consistency).
  • max_tool_arg_chars truncation is preserved in the JSON path.

Verified end-to-end on ik_llama (streaming + non-streaming): with this change the exact previously-failing captured request returns a structured tool_calls array with finish_reason: tool_calls.


Each fix in this series is an independent PR based on current main (v20); they touch overlapping regions of the same file, so merging one may require the others to be rebased — happy to update them.

froggeric changed pull request status to closed

Hey @Moore2877 ! Nevermind about rebasing—I went ahead and manually resolved the merge conflicts and integrated your excellent fixes directly into the new v21 release on main. Thank you so much for this incredible series of PRs! Closing this as the code is now officially merged.

Sign up or log in to comment