flowcast-v4-lite · sub-1GB voice agent stack

Flowcast v4-lite is an 809 MB hot-path stack for macOS voice agents. It builds on flowcast-v3-lite with a promoted LFM2.5 writer, IR v5 push5 planner, transcript-first automation repairs, and intent fast-path routing.

100% benchmark accuracy. Faster dictation. IR v5 fallback when repairs miss.

v1.0.1

Microphone settings fix: "open microphone settings" now resolves to system_setting / microphone (retrained IR v5; was launch_app / Mikrofon when the wrong chat template was applied).
Chat template routing: use ir_chat_template_family: functiongemma for the IR model and writer_chat_template_family: lfm25 for the writer (see inference_config.json).

Benchmarks (vs v3-lite)

Gate	v3-lite	v4-lite	Δ
Core overall	100%	100%	tie
Expanded overall	100%	100%	tie
Held-out overall	100%	100%	tie
Dictation p50	360 ms	208 ms	-152 ms
Core avg latency	116 ms	78 ms	-38 ms
IR model path p50	793 ms	453 ms	-340 ms
Hot download	~809 MB	~809 MB	same

Production automation resolves via deterministic transcript repairs and intent fast-path (0 IR calls on the hot path). IR v5 is exercised on the fallback model path.

Architecture

spoken command
    → transcript-first repairs + intent fast-path  (dominant, ~0 ms)
    → FunctionGemma IR v5  (~270M + 5MB adapter, fallback)
    → compact JSON intent
    → transcript-aware compiler
    → DesktopAutomationPlan JSON

spoken dictation / intent
    → LFM2.5 writer  (~1.2B + 42MB adapter, KV-cached prefix)
    → polished text or intent label

Quick start

pip install mlx-lm huggingface_hub gemmaflow-tune

from huggingface_hub import snapshot_download
from mlx_lm import load, generate
from gemmaflow_tune.compact_ir import compact_ir_prompt, parse_compact_ir, repair_compact_ir, compile_compact_ir

bundle = snapshot_download("nsalerni/flowcast-v4-lite")
writer_base = snapshot_download("mlx-community/LFM2.5-1.2B-Instruct-4bit")
ir_base = snapshot_download("mlx-community/functiongemma-270m-it-4bit")

writer_model, writer_tok = load(writer_base, adapter_path=f"{bundle}/writer")
ir_model, ir_tok = load(ir_base, adapter_path=f"{bundle}/ir")

# Automation: IR → compiler (fallback path)
transcript = "open codex and create a new thread in my loudink project"
ir_prompt = compact_ir_prompt(transcript)
ir_out = generate(ir_model, ir_tok, prompt=ir_prompt, max_tokens=96)
ir = repair_compact_ir(parse_compact_ir(ir_out), transcript)
plan = compile_compact_ir(ir, transcript=transcript)
print(plan)

Files

File	Description
`writer/adapters.safetensors`	LFM2.5 writer LoRA (`promoted_core_100`)
`writer/adapter_config.json`	Writer LoRA config
`ir/adapters.safetensors`	FunctionGemma IR v5 push5 LoRA
`ir/adapter_config.json`	IR LoRA config
`inference_config.json`	Runtime settings + benchmark scores
`manifest.json`	Production manifest for GemmaFlow integration

Recommended inference settings

{
  "runner_kind": "compact_ir",
  "prompt_mode": "verbose",
  "json_early_stop": true,
  "writer_use_prompt_kv_cache": true,
  "ir_use_prompt_kv_cache": true,
  "transcript_first_ir": true,
  "intent_fast_path": true,
  "temperature": 0.0,
  "top_p": 1.0,
  "ir_max_tokens": 96,
  "dictation_max_tokens": 192
}

Training lineage

Fine-tuned with gemmaflow-tune:

Writer base: mlx-community/LFM2.5-1.2B-Instruct-4bit
IR base: mlx-community/functiongemma-270m-it-4bit
Method: LoRA on both models + deterministic transcript compiler + push5 IR corpus
Predecessor: nsalerni/flowcast-v3-lite

Citation

@misc{flowcastv4lite2026,
  title  = {flowcast-v4-lite: Sub-1GB Voice Desktop Automation for GemmaFlow},
  author = {Salerni, Nicola},
  year   = {2026},
  url    = {https://huggingface.co/nsalerni/flowcast-v4-lite}
}

License

Apache 2.0. Base models subject to their respective licenses (LFM2.5, FunctionGemma/Gemma).

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for nsalerni/flowcast-v4-lite

Base model

LiquidAI/LFM2.5-1.2B-Base

Finetuned

LiquidAI/LFM2.5-1.2B-Instruct

Quantized

mlx-community/LFM2.5-1.2B-Instruct-4bit

Adapter

(2)

this model