Tool Call Incompatibility With Claude Code

#9
by abrar360 - opened

I'm running with the following setup:

  • llama.cpp -m /home/--/Documents/GLM-4.6-UD-Q2_K_XL/GLM-4.6-UD-Q2_K_XL-00001-of-00003.gguf -ngl 999 --host 0.0.0.0 --slots --jinja --ctx-size 60000 --flash-attn on --temp 1.0 --top-p 0.95 --top-k 40 --min-p 0.0 -ot ".ffn_.*_exps.=CPU" --reasoning-format auto --cache-type-k q8_0 --cache-type-v q8_0 --no-warmup

  • 1 x 4090

  • It works great when I tested it in the llama-server webUI and llama-cli

However, tool calls seem to mess up and cause the agentic loop to halt when I use it with claude code via claude-code-proxy: (https://github.com/1rgs/claude-code-proxy) (https://www.reddit.com/r/LocalLLaMA/comments/1m118is/use_claudecode_with_local_models/)

image

Not sure if this is an issue with the chat template, the model losing function calling accuracy due to quant, or some sort of unique formatting needed for claude code.
Help would be appreciated.

abrar360 changed discussion status to closed

Sign up or log in to comment