flowcast-v3-lite · sub-1GB voice agent stack

Flowcast v3-lite is a 809 MB hot-path stack for macOS voice agents: a LFM2.5-1.2B writer handles dictation polish and intent routing; a FunctionGemma 270M compact-IR planner classifies automation requests; a deterministic transcript-aware compiler expands them into full DesktopAutomationPlan JSON — including deep links into native apps (Codex threads, Cursor agents, Claude chats) and web apps (Gmail tabs, Calendar views).

Under 1 GB day one. v3-class accuracy. 2.4× faster.

Supersedes lazy-loading flowcast-sota-v3 (2.5 GB) for size-constrained installs.

Benchmarks (vs v3)

Gate	v3	v3-lite	Δ
Core overall (128)	95.3%	100%	+4.7pp
Core held-out (37)	94.6%	100%	+5.4pp
Automation	84.6%	100%	+15.4pp
Dictation	97.5%	100%	+2.5pp
Core p50 latency	~987ms	~407ms	2.4× faster
Hot download	~2.5 GB	~809 MB	3× smaller

Deep-link cases (Codex threads, Gmail compose/tabs, Calendar views) pass via the transcript compiler — no v3 lazy load required.

Architecture

spoken command
    → FunctionGemma IR  (~270M + 5MB adapter)
    → compact JSON intent
    → transcript-aware compiler  (calendar views, Gmail tabs, native app workflows)
    → DesktopAutomationPlan JSON

spoken dictation / intent
    → LFM2.5 writer  (~1.2B + 42MB adapter)
    → polished text or intent label

Quick start

pip install mlx-lm huggingface_hub gemmaflow-tune

from huggingface_hub import snapshot_download
from mlx_lm import load, generate
from gemmaflow_tune.compact_ir import compact_ir_prompt, parse_compact_ir, repair_compact_ir, compile_compact_ir

bundle = snapshot_download("nsalerni/flowcast-v3-lite")
writer_base = snapshot_download("mlx-community/LFM2.5-1.2B-Instruct-4bit")
ir_base = snapshot_download("mlx-community/functiongemma-270m-it-4bit")

writer_model, writer_tok = load(writer_base, adapter_path=f"{bundle}/writer")
ir_model, ir_tok = load(ir_base, adapter_path=f"{bundle}/ir")

# Automation: IR → compiler
transcript = "open codex and create a new thread in my loudink project"
ir_prompt = compact_ir_prompt(transcript)
ir_out = generate(ir_model, ir_tok, prompt=ir_prompt, max_tokens=96)
ir = repair_compact_ir(parse_compact_ir(ir_out), transcript)
plan = compile_compact_ir(ir, transcript=transcript)
print(plan)

# Dictation: writer
# (use writer_model with dictation prompt — see gemmaflow-tune prompts)

Files

File	Description
`writer/adapters.safetensors`	LFM2.5 writer LoRA (checkpoint `0000006`)
`writer/adapter_config.json`	Writer LoRA config
`ir/adapters.safetensors`	FunctionGemma compact-IR LoRA
`ir/adapter_config.json`	IR LoRA config
`inference_config.json`	Runtime settings + benchmark scores
`manifest.json`	Production manifest for GemmaFlow integration

Recommended inference settings

{
  "runner_kind": "compact_ir",
  "prompt_mode": "verbose",
  "json_early_stop": true,
  "temperature": 0.0,
  "top_p": 1.0,
  "ir_max_tokens": 96,
  "dictation_max_tokens": 192
}

Training lineage

Fine-tuned with gemmaflow-tune:

Writer base: mlx-community/LFM2.5-1.2B-Instruct-4bit
IR base: mlx-community/functiongemma-270m-it-4bit
Method: LoRA on both models + deterministic transcript compiler
Predecessor: nsalerni/gemma-4-e2b-flowcast-v3 (teacher + benchmark reference)

Citation

@misc{flowcastv3lite2026,
  title  = {flowcast-v3-lite: Sub-1GB Voice Desktop Automation for GemmaFlow},
  author = {Salerni, Nicola},
  year   = {2026},
  url    = {https://huggingface.co/nsalerni/flowcast-v3-lite}
}

License

Apache 2.0. Base models subject to their respective licenses (LFM2.5, FunctionGemma/Gemma).

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for nsalerni/flowcast-v3-lite

Base model

LiquidAI/LFM2.5-1.2B-Base

Finetuned

LiquidAI/LFM2.5-1.2B-Instruct

Quantized

mlx-community/LFM2.5-1.2B-Instruct-4bit

Adapter

(2)

this model