FunctionGemma 270M β€” LiteRT-LM (Android)

A LiteRT-LM export of google/functiongemma-270m-it, packaged for on-device function-calling on Android via Google's LiteRT-LM runtime. Weights are dynamic-range quantized to int8 (activations stay fp32) β€” the format LiteRT expects for XNNPACK acceleration.

The bundle ships a single .litertlm artifact that includes the transformer graph, tokenizer, and chat template, plus a small config.json that documents the workflow contract this build supports (single-turn calls and parallel calls; multi-step chaining / long-multi-turn slot-filling are out of scope for this version).

Model

Parameters 270M
Architecture Gemma 3 (18 layers, 4 query heads, 1 KV head, head_dim 256, hidden 640)
Quantization dynamic_wi8_afp32 (int8 weights, fp32 activations)
Format LiteRT-LM .litertlm
Context length 32,768 (model) / 1,024 (export)
Prefill chunk sizes 128, 512, 1024
File size ~283 MB
Backend LiteRT CPU + XNNPACK

Files

File Size Description
model.litertlm 283 MB LiteRT-LM bundle (weights + tokenizer + chat template). Pass the path to litert_lm.Engine.
config.json ~2 KB Workflow contract β€” stop tokens, supported workflows, function-call output format.

Workflow contract

The export documents stop-token profiles and supported workflows in config.json. Key fields:

  • outputFormat: <start_function_call>call:name{arg:<escape>value<escape>}<end_function_call>
  • responseFormat: <start_function_response>response:name{arg:<escape>value<escape>}<end_function_response>
  • supportedWorkflows: single_turn_function_call, parallel_function_call
  • unsupportedWorkflows: multi_step_chaining, long_multi_turn_slot_filling
  • stopTokenProfiles.toolCall: <end_function_call>, <start_function_response>, <end_of_turn>
  • stopTokenProfiles.finalResponse: <end_of_turn>, <end_function_call>

Verification

The bundled validation (smoke_litert.py in the speech-android SDK) runs a two-pass tool call end-to-end and confirms the parsed call:

prompt:        "What is the current weather in Tokyo?"
raw call:      <start_function_call>call:get_current_weather{location:<escape>Tokyo<escape>}<end_function_call>
parsed call:   {"name": "get_current_weather", "arguments": {"location": "Tokyo"}}
tool result:   {"location": "Tokyo", "temperature": 15, "unit": "celsius", "condition": "sunny"}
final reply:   "The current weather in Tokyo is sunny with a temperature of 15.0 degrees Celsius."

A 4-test pytest suite (test_litert.py in the speech-android SDK) also passes (load + weather + timer-300-seconds + two-pass-with-final-response).

Usage

Python (via litert_lm)

import litert_lm
from functools import partial

def get_current_weather(location: str, unit: str = "celsius") -> dict:
    """Gets the current weather in a given location."""
    return {"location": location, "temperature": 15, "unit": unit, "condition": "sunny"}

TOOLS = {"get_current_weather": get_current_weather}

engine = litert_lm.Engine(
    model_path="model.litertlm",
    backend=litert_lm.Backend.CPU(),
    max_num_tokens=256,
)

with engine.create_conversation(
    tools=list(TOOLS.values()),
    automatic_tool_calling=False,
) as conv:
    first  = conv.send_message("What is the current weather in Tokyo?")
    # parse `<start_function_call>...<end_function_call>` from `first`,
    # call the matching tool from TOOLS, send the result back:
    final  = conv.send_message({"role": "tool", "content": [{"type": "tool_response", ...}]})

The full driver β€” parsing, tool dispatch, and the two-pass loop β€” is published as part of the speech-android SDK.

Android (Kotlin / Java)

Bundle model.litertlm in app/src/main/assets/ and load it with the LiteRT-LM Android runtime; see the speech-android SDK for the ready-to-use wrapper.

Source

Upstream model: google/functiongemma-270m-it β€” Gemma 3 270M instruction-tuned for structured function calls.

Links

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for soniqo/FunctionGemma-270M-LiteRT-LM

Finetuned
(425)
this model