Gemma-4 Coder — tool-call recovery shim

A tiny, framework-free post-processor that recovers structured tool_calls from gemma-4 12B coder GGUFs served on llama.cpp.

Why you need it

llama.cpp --jinja does not recognise gemma-4's native tool-call format, so the bare parser under-reports tool calls. The model is fine — the parser is blind to the format. The model emits a call as text in content:

<|tool_call>call:get_weather{"city": "Paris", "units": "celsius"}

This shim re-parses that (and the leaked variants different builds produce — <|tool>NAME{…}, <call:NAME{…}>, NAME(k=v)<|/tool|>) back into the OpenAI shape your client expects. It never touches the weights and adds nothing beyond a regex scan.

Files

File	What
`gemma_tool_parse.py`	the pure-stdlib core: call detection, tolerant arg parsing, content cleanup, and the serve-side tools-in-prompt block. No dependencies.
`standalone.py`	use it without any framework — call `recovered_tool_calls(text)` on a completion. Runnable demo.
`litellm_shim.py`	a drop-in litellm `CustomLogger` callback (pre-call folds tool defs + stop-strings; post-call recovers `tool_calls`).

The algorithm (re-implement in any language)

Scan the assistant content for the native marker <|tool_call> and the leaked text forms (<|tool>NAME{…}, <call:NAME{…}>, a bare NAME{…}/NAME(k=v) anchored on a tool marker). Names may be dotted (weather.get_weather).
Extract per match the tool name and the arguments object (balanced-brace scan from the first {; tolerate trailing prose, fancy quotes, bare keys, = or :).
Coerce scalar arg values ("3"→3, "true"→true) so the downstream tool validates; leave unknown fields untouched.
Emit OpenAI tool_calls [{"id","type":"function","function":{"name","arguments":<json-string>}}], set finish_reason: "tool_calls", and strip the recovered markup from content.
Pass through unchanged when no marker is found (plain answers are unaffected).

A second concern, also handled: the model was fine-tuned with tool definitions folded into the prompt, so it only behaves if it sees the same block at serve time. Call fold_tools_prompt(messages, tools) (in gemma_tool_parse.py) before sending.

Usage

Standalone (no litellm)

from standalone import recovered_tool_calls

tool_calls, content = recovered_tool_calls(assistant_text)

litellm

# config.yaml
litellm_settings:
  callbacks: ["litellm_shim.shim"]

Set GEMMA_SHIM_MODELS=my-deploy-name to gate the shim to specific deployments (default: all). litellm hook signatures vary across releases — litellm_shim.py is a reference adapter; adjust the method names to your version if needed.

Models that use this

Apache-2.0 — use it freely.

Downloads last month: -; Downloads are not tracked for this model. How to track