YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
FunctionGemma 270M Mobile Exports
FunctionGemma is a Gemma 3 270M variant trained for local function calling. It is intended to translate user text into structured tool calls, then optionally turn the tool result into a short user-facing response.
Setup
Accept the google/functiongemma-270m-it license on Hugging Face, then
authenticate before running conversion:
export HF_TOKEN=hf_...
cd models/functiongemma/export
poetry env use /opt/homebrew/bin/python3.11
poetry install --with convert
LiteRT-LM Export
poetry run python convert.py --output-dir ./functiongemma-litert
The default export uses LiteRT Torch's dynamic_wi8_afp32 quantization recipe,
prefill lengths 128,512,1024, and a 1024 token KV cache. For a larger
mobile prompt budget:
poetry run python convert.py \
--output-dir ./functiongemma-litert \
--cache-length 2048 \
--prefill-lengths 128,512,1024,2048
Use --quantize none only for debugging.
The default quantized bundle is about 283 MB for model.litertlm; LiteRT may
also create a local XNNPACK cache file next to it.
Validate
poetry run pytest test_function_calls.py test_litert.py -q
poetry run python smoke_litert.py
CoreML Export
poetry run python convert_coreml.py \
--output-dir ./functiongemma-coreml \
--compute-precision float32 \
--quantize int8
The CoreML artifact is a fixed 128-token last-logits model. It uses int8 weights with float32 compute because the float16 compute export produced NaN logits in local validation. This CoreML path does full-context recompute for each generated token; LiteRT-LM remains the preferred production path for tool calling latency.
poetry run pytest test_coreml.py -q
Validated local bundle:
functiongemma-coreml/FunctionGemmaLastLogits.mlpackagefunctiongemma-coreml/config.json- tokenizer files in
functiongemma-coreml/
Benchmarks
poetry run python benchmark.py --backend litert --runs 5 --warmup 1
poetry run python benchmark.py --backend coreml --coreml-compute-units cpu --runs 5 --warmup 1
poetry run python benchmark.py --backend coreml --coreml-compute-units cpu_and_ne --runs 5 --warmup 1
Local results on this machine:
| Backend | Quantization | Load RSS Δ | Peak RSS Δ | Mean tok/s |
|---|---|---|---|---|
| LiteRT-LM CPU | dynamic int8 | 551.1 MB | 865.3 MB | 148.54 |
| CoreML CPU | int8 weights, fp32 compute | 658.0 MB | 1690.4 MB | 31.49 |
| CoreML CPU+NE | int8 weights, fp32 compute | 86.7 MB | 1129.8 MB | 32.82 |
Runtime Loop
The model should be used in two passes:
- Build a prompt with
format_tool_call_prompt(...)and stop on<end_function_call>or<start_function_response>. - Parse the returned call with
parse_function_calls(...), validate it against an allowlist, and execute the tool. - Build a second prompt with
format_final_response_prompt(...)and stop on<end_of_turn>to get the final user-facing answer.
For command-only actions, the app can skip the second pass and present its own deterministic UI response after the tool succeeds.
FunctionGemma is trained for single-turn and parallel tool calls. Do not rely on it for multi-step dependency chains without app-side orchestration or fine-tuning.
The LiteRT-LM Python runtime currently returns FunctionGemma calls as raw text, for example:
<start_function_call>call:get_current_weather{location:<escape>Tokyo<escape>}<end_function_call>
Use parse_function_calls(...) to validate and dispatch the call. After the
tool response is sent back as a tool_response turn, the same exported model can
produce the final user-facing answer.
Mobile Artifacts
Ship these files:
functiongemma-litert/model.litertlmfunctiongemma-litert/config.json
Do not ship local runtime caches such as
model.litertlm.xnnpack_cache_*; they are regenerated by LiteRT.
- Downloads last month
- 47