flowcast-v3-lite Β· sub-1GB voice agent stack

Flowcast v3-lite is a 809 MB hot-path stack for macOS voice agents: a LFM2.5-1.2B writer handles dictation polish and intent routing; a FunctionGemma 270M compact-IR planner classifies automation requests; a deterministic transcript-aware compiler expands them into full DesktopAutomationPlan JSON β€” including deep links into native apps (Codex threads, Cursor agents, Claude chats) and web apps (Gmail tabs, Calendar views).

Under 1 GB day one. v3-class accuracy. 2.4Γ— faster.

Supersedes lazy-loading flowcast-sota-v3 (2.5 GB) for size-constrained installs.

Benchmarks (vs v3)

Gate v3 v3-lite Ξ”
Core overall (128) 95.3% 100% +4.7pp
Core held-out (37) 94.6% 100% +5.4pp
Automation 84.6% 100% +15.4pp
Dictation 97.5% 100% +2.5pp
Core p50 latency ~987ms ~407ms 2.4Γ— faster
Hot download ~2.5 GB ~809 MB 3Γ— smaller

Deep-link cases (Codex threads, Gmail compose/tabs, Calendar views) pass via the transcript compiler β€” no v3 lazy load required.

Architecture

spoken command
    β†’ FunctionGemma IR  (~270M + 5MB adapter)
    β†’ compact JSON intent
    β†’ transcript-aware compiler  (calendar views, Gmail tabs, native app workflows)
    β†’ DesktopAutomationPlan JSON

spoken dictation / intent
    β†’ LFM2.5 writer  (~1.2B + 42MB adapter)
    β†’ polished text or intent label

Quick start

pip install mlx-lm huggingface_hub gemmaflow-tune
from huggingface_hub import snapshot_download
from mlx_lm import load, generate
from gemmaflow_tune.compact_ir import compact_ir_prompt, parse_compact_ir, repair_compact_ir, compile_compact_ir

bundle = snapshot_download("nsalerni/flowcast-v3-lite")
writer_base = snapshot_download("mlx-community/LFM2.5-1.2B-Instruct-4bit")
ir_base = snapshot_download("mlx-community/functiongemma-270m-it-4bit")

writer_model, writer_tok = load(writer_base, adapter_path=f"{bundle}/writer")
ir_model, ir_tok = load(ir_base, adapter_path=f"{bundle}/ir")

# Automation: IR β†’ compiler
transcript = "open codex and create a new thread in my loudink project"
ir_prompt = compact_ir_prompt(transcript)
ir_out = generate(ir_model, ir_tok, prompt=ir_prompt, max_tokens=96)
ir = repair_compact_ir(parse_compact_ir(ir_out), transcript)
plan = compile_compact_ir(ir, transcript=transcript)
print(plan)

# Dictation: writer
# (use writer_model with dictation prompt β€” see gemmaflow-tune prompts)

Files

File Description
writer/adapters.safetensors LFM2.5 writer LoRA (checkpoint 0000006)
writer/adapter_config.json Writer LoRA config
ir/adapters.safetensors FunctionGemma compact-IR LoRA
ir/adapter_config.json IR LoRA config
inference_config.json Runtime settings + benchmark scores
manifest.json Production manifest for GemmaFlow integration

Recommended inference settings

{
  "runner_kind": "compact_ir",
  "prompt_mode": "verbose",
  "json_early_stop": true,
  "temperature": 0.0,
  "top_p": 1.0,
  "ir_max_tokens": 96,
  "dictation_max_tokens": 192
}

Training lineage

Fine-tuned with gemmaflow-tune:

  • Writer base: mlx-community/LFM2.5-1.2B-Instruct-4bit
  • IR base: mlx-community/functiongemma-270m-it-4bit
  • Method: LoRA on both models + deterministic transcript compiler
  • Predecessor: nsalerni/gemma-4-e2b-flowcast-v3 (teacher + benchmark reference)

Citation

@misc{flowcastv3lite2026,
  title  = {flowcast-v3-lite: Sub-1GB Voice Desktop Automation for GemmaFlow},
  author = {Salerni, Nicola},
  year   = {2026},
  url    = {https://huggingface.co/nsalerni/flowcast-v3-lite}
}

License

Apache 2.0. Base models subject to their respective licenses (LFM2.5, FunctionGemma/Gemma).

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for nsalerni/flowcast-v3-lite