Instructions to use soniqo/FunctionGemma-270M-LiteRT-LM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use soniqo/FunctionGemma-270M-LiteRT-LM with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=soniqo/FunctionGemma-270M-LiteRT-LM \ model.litertlm \ --prompt="Write me a poem"
- LiteRT
How to use soniqo/FunctionGemma-270M-LiteRT-LM with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
FunctionGemma 270M β LiteRT-LM (Android)
A LiteRT-LM export of google/functiongemma-270m-it, packaged for on-device function-calling on Android via Google's LiteRT-LM runtime. Weights are dynamic-range quantized to int8 (activations stay fp32) β the format LiteRT expects for XNNPACK acceleration.
The bundle ships a single .litertlm artifact that includes the
transformer graph, tokenizer, and chat template, plus a small
config.json that documents the workflow contract this build supports
(single-turn calls and parallel calls; multi-step chaining /
long-multi-turn slot-filling are out of scope for this version).
Model
| Parameters | 270M |
| Architecture | Gemma 3 (18 layers, 4 query heads, 1 KV head, head_dim 256, hidden 640) |
| Quantization | dynamic_wi8_afp32 (int8 weights, fp32 activations) |
| Format | LiteRT-LM .litertlm |
| Context length | 32,768 (model) / 1,024 (export) |
| Prefill chunk sizes | 128, 512, 1024 |
| File size | ~283 MB |
| Backend | LiteRT CPU + XNNPACK |
Files
| File | Size | Description |
|---|---|---|
model.litertlm |
283 MB | LiteRT-LM bundle (weights + tokenizer + chat template). Pass the path to litert_lm.Engine. |
config.json |
~2 KB | Workflow contract β stop tokens, supported workflows, function-call output format. |
Workflow contract
The export documents stop-token profiles and supported workflows in
config.json. Key fields:
outputFormat:<start_function_call>call:name{arg:<escape>value<escape>}<end_function_call>responseFormat:<start_function_response>response:name{arg:<escape>value<escape>}<end_function_response>supportedWorkflows:single_turn_function_call,parallel_function_callunsupportedWorkflows:multi_step_chaining,long_multi_turn_slot_fillingstopTokenProfiles.toolCall:<end_function_call>,<start_function_response>,<end_of_turn>stopTokenProfiles.finalResponse:<end_of_turn>,<end_function_call>
Verification
The bundled validation (smoke_litert.py in the speech-android SDK)
runs a two-pass tool call end-to-end and confirms the parsed call:
prompt: "What is the current weather in Tokyo?"
raw call: <start_function_call>call:get_current_weather{location:<escape>Tokyo<escape>}<end_function_call>
parsed call: {"name": "get_current_weather", "arguments": {"location": "Tokyo"}}
tool result: {"location": "Tokyo", "temperature": 15, "unit": "celsius", "condition": "sunny"}
final reply: "The current weather in Tokyo is sunny with a temperature of 15.0 degrees Celsius."
A 4-test pytest suite (test_litert.py in the speech-android SDK) also
passes (load + weather + timer-300-seconds + two-pass-with-final-response).
Usage
Python (via litert_lm)
import litert_lm
from functools import partial
def get_current_weather(location: str, unit: str = "celsius") -> dict:
"""Gets the current weather in a given location."""
return {"location": location, "temperature": 15, "unit": unit, "condition": "sunny"}
TOOLS = {"get_current_weather": get_current_weather}
engine = litert_lm.Engine(
model_path="model.litertlm",
backend=litert_lm.Backend.CPU(),
max_num_tokens=256,
)
with engine.create_conversation(
tools=list(TOOLS.values()),
automatic_tool_calling=False,
) as conv:
first = conv.send_message("What is the current weather in Tokyo?")
# parse `<start_function_call>...<end_function_call>` from `first`,
# call the matching tool from TOOLS, send the result back:
final = conv.send_message({"role": "tool", "content": [{"type": "tool_response", ...}]})
The full driver β parsing, tool dispatch, and the two-pass loop β is published as part of the speech-android SDK.
Android (Kotlin / Java)
Bundle model.litertlm in app/src/main/assets/ and load it with the
LiteRT-LM Android runtime; see the
speech-android SDK for the
ready-to-use wrapper.
Source
Upstream model: google/functiongemma-270m-it β Gemma 3 270M instruction-tuned for structured function calls.
Links
- speech-android β Android SDK
- soniqo.audio β website
- blog
- Downloads last month
- 4
Model tree for soniqo/FunctionGemma-270M-LiteRT-LM
Base model
google/functiongemma-270m-it