flowcast-v4-lite Β· sub-1GB voice agent stack

Flowcast v4-lite is an 809 MB hot-path stack for macOS voice agents. It builds on flowcast-v3-lite with a promoted LFM2.5 writer, IR v5 push5 planner, transcript-first automation repairs, and intent fast-path routing.

100% benchmark accuracy. Faster dictation. IR v5 fallback when repairs miss.

v1.0.1

  • Microphone settings fix: "open microphone settings" now resolves to system_setting / microphone (retrained IR v5; was launch_app / Mikrofon when the wrong chat template was applied).
  • Chat template routing: use ir_chat_template_family: functiongemma for the IR model and writer_chat_template_family: lfm25 for the writer (see inference_config.json).

Benchmarks (vs v3-lite)

Gate v3-lite v4-lite Ξ”
Core overall 100% 100% tie
Expanded overall 100% 100% tie
Held-out overall 100% 100% tie
Dictation p50 360 ms 208 ms -152 ms
Core avg latency 116 ms 78 ms -38 ms
IR model path p50 793 ms 453 ms -340 ms
Hot download ~809 MB ~809 MB same

Production automation resolves via deterministic transcript repairs and intent fast-path (0 IR calls on the hot path). IR v5 is exercised on the fallback model path.

Architecture

spoken command
    β†’ transcript-first repairs + intent fast-path  (dominant, ~0 ms)
    β†’ FunctionGemma IR v5  (~270M + 5MB adapter, fallback)
    β†’ compact JSON intent
    β†’ transcript-aware compiler
    β†’ DesktopAutomationPlan JSON

spoken dictation / intent
    β†’ LFM2.5 writer  (~1.2B + 42MB adapter, KV-cached prefix)
    β†’ polished text or intent label

Quick start

pip install mlx-lm huggingface_hub gemmaflow-tune
from huggingface_hub import snapshot_download
from mlx_lm import load, generate
from gemmaflow_tune.compact_ir import compact_ir_prompt, parse_compact_ir, repair_compact_ir, compile_compact_ir

bundle = snapshot_download("nsalerni/flowcast-v4-lite")
writer_base = snapshot_download("mlx-community/LFM2.5-1.2B-Instruct-4bit")
ir_base = snapshot_download("mlx-community/functiongemma-270m-it-4bit")

writer_model, writer_tok = load(writer_base, adapter_path=f"{bundle}/writer")
ir_model, ir_tok = load(ir_base, adapter_path=f"{bundle}/ir")

# Automation: IR β†’ compiler (fallback path)
transcript = "open codex and create a new thread in my loudink project"
ir_prompt = compact_ir_prompt(transcript)
ir_out = generate(ir_model, ir_tok, prompt=ir_prompt, max_tokens=96)
ir = repair_compact_ir(parse_compact_ir(ir_out), transcript)
plan = compile_compact_ir(ir, transcript=transcript)
print(plan)

Files

File Description
writer/adapters.safetensors LFM2.5 writer LoRA (promoted_core_100)
writer/adapter_config.json Writer LoRA config
ir/adapters.safetensors FunctionGemma IR v5 push5 LoRA
ir/adapter_config.json IR LoRA config
inference_config.json Runtime settings + benchmark scores
manifest.json Production manifest for GemmaFlow integration

Recommended inference settings

{
  "runner_kind": "compact_ir",
  "prompt_mode": "verbose",
  "json_early_stop": true,
  "writer_use_prompt_kv_cache": true,
  "ir_use_prompt_kv_cache": true,
  "transcript_first_ir": true,
  "intent_fast_path": true,
  "temperature": 0.0,
  "top_p": 1.0,
  "ir_max_tokens": 96,
  "dictation_max_tokens": 192
}

Training lineage

Fine-tuned with gemmaflow-tune:

  • Writer base: mlx-community/LFM2.5-1.2B-Instruct-4bit
  • IR base: mlx-community/functiongemma-270m-it-4bit
  • Method: LoRA on both models + deterministic transcript compiler + push5 IR corpus
  • Predecessor: nsalerni/flowcast-v3-lite

Citation

@misc{flowcastv4lite2026,
  title  = {flowcast-v4-lite: Sub-1GB Voice Desktop Automation for GemmaFlow},
  author = {Salerni, Nicola},
  year   = {2026},
  url    = {https://huggingface.co/nsalerni/flowcast-v4-lite}
}

License

Apache 2.0. Base models subject to their respective licenses (LFM2.5, FunctionGemma/Gemma).

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for nsalerni/flowcast-v4-lite