v19 released with major improvements

#22
by froggeric - opened

I think I have finally solved the frequent stops in v19. So far it has been flawless in 3 long agentic tests in a row. Previously, I had it happen in around 80% of my sessions.

This has been a tough one to crack. To fix it I had to resort to better prompt engineering:

<IMPORTANT>
Reminder:
- You can use the <think></think> block to plan your next tool call OR to synthesize data and formulate your final response to the user.
- ALL explanation and reasoning MUST be placed strictly inside the <think></think> block.
- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags.
- If you choose to call a tool, you MUST output the <tool_call> block IMMEDIATELY after closing </think>. Do NOT output any conversational text before the tool call.
- The <tool_call> and <function> tags MUST be at the very beginning of a new line, with NO spaces or indentation before them.
- To call multiple functions, output a separate, completely closed <tool_call></tool_call> block for EACH function. Do NOT nest <tool_call> blocks.
- If you have gathered all necessary data and do not need to call a tool, answer the question like normal and provide your final response to the user IMMEDIATELY after closing </think>.
</IMPORTANT>

It helped a bit, but did not solve it. What I think finally did it, was a complete rewrite of the KV cache handling, by setting preserve_thinking to true as default, and abolishing the empty think injection, which was poisoning the model's in-context learning.

Will give it a shot.

Thank you so much for your time and effort 🌹

Hey, I did check, and it works well, but issue I observed since v18, still persists with opencode and tool calls bleeding into the message.

Exploring the input and move control system now.
<function=aft_outline>
<parameter=target>
src/features/input
</parameter>
</function>
</tool_call>

Leaks into the prompt, and LLM stops...

Strange, I have 0 instances of it across hundreds of tools calls since then. I am using F16, so maybe this is related to loss of intelligence in lower quants.

It happens ony with specific plugins..

Ones who modify messages (dcp, magic context).

And it happens with first message...

Without those plugins, it works amazing.

EDIT: Ok, this is for sure some context shananigans... As asking same question, with same prompt, just without context manipulation plugin - it works fine.

With plugins on - it bleeds either tools, thinking or both... And dies after first message, if second is a tool call...

I could provide you messages to compare - one with plugin enabled, and other without - if that is of any help?

In that case, I suspect those plugins manipulate the context history and KV cache, which is confusing the model on how to think, how to use tools, how to transition between states, etc. I would recommend not using any such plugin with any kind of model. It's akin to us, humans, have suddenly thoughts and memories disappearing throughout the day as we are working...

Yea...

They manipulate messages a lot.

But that's "a" way to manage context

I'd definitely pin that on OpenCode - I've had decent success with version 19. https://github.com/NousResearch/hermes-agent/issues/27339 is one such related issue (but for hermes-agent as opposed to OpenCode), so that's why I'm leaning towards the issue being the harness. Everyone is still learning quirks and "dos and don't"s.

I was using either v15 or v18 before with Hermes-Agent 27b (Opus 4.7 distilled) and it was losing its train of thought and getting into reasoning loops. I've been working through the kanban and it has been mostly reliable - no evidence of bad tool calls and it seems to behave well. Keep an eye on your harness's issue trackers!

That might be. There are so many moving parts in this world..

Claude Code is still unusable as of v19. Continuous break ups and stalls

I switched in opencode from openai to anthropic provider... And with magic context its stable. Still bleeds with dcp. .

It works perfectly for me in Claude Code, using llama-server as an anthropic provider. I am using the Qwen 3.6 27b F16 gguf I published, with the chat template v19.

image

I think anthropic provider is important.

Became very stable (minus dcp injections).

im using vllm. I tried with and without on cyburn/Qwopus3.6-35B-A3B-v1-PrismaSCOUT-Blackwell-NVFP4-BF16-vllm-4.75bits with MTP and 256k context.
--default-chat-template-kwargs '{"enable_thinking": false, "auto_disable_thinking_with_tools": true, "max_tool_response_chars": 8192}'
While looking for a solution for my hangs, i've came accross settings default-chat-template-kwargs.

This is just an example of may hangs:
image

image

image

the comes up. However,

UPDATED: another random stop.

I'm actually getting a weird model stop in claude code on ls (running this in windows and powershell)
It happened both in native powershell and default ls)

image

and I'm not even sure if it fixes the thing where it reads contents of a PE / binary file and then permacrashes because its thinking has unicode tags.... still running that and hoping for the best.

image

image

Please post issues in separate threads.

I strongly recommend leaveing preserve_thinking to true (default in this template), and leaving thinking on (default as well). The models perform and reason a lot better.

Sign up or log in to comment