Instructions to use nsalerni/flowcast-v3-lite with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use nsalerni/flowcast-v3-lite with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("nsalerni/flowcast-v3-lite") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use nsalerni/flowcast-v3-lite with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "nsalerni/flowcast-v3-lite" --prompt "Once upon a time"
flowcast-v3-lite Β· sub-1GB voice agent stack
Flowcast v3-lite is a 809 MB hot-path stack for macOS voice agents: a LFM2.5-1.2B writer handles dictation polish and intent routing; a FunctionGemma 270M compact-IR planner classifies automation requests; a deterministic transcript-aware compiler expands them into full DesktopAutomationPlan JSON β including deep links into native apps (Codex threads, Cursor agents, Claude chats) and web apps (Gmail tabs, Calendar views).
Under 1 GB day one. v3-class accuracy. 2.4Γ faster.
Supersedes lazy-loading flowcast-sota-v3 (2.5 GB) for size-constrained installs.
Benchmarks (vs v3)
| Gate | v3 | v3-lite | Ξ |
|---|---|---|---|
| Core overall (128) | 95.3% | 100% | +4.7pp |
| Core held-out (37) | 94.6% | 100% | +5.4pp |
| Automation | 84.6% | 100% | +15.4pp |
| Dictation | 97.5% | 100% | +2.5pp |
| Core p50 latency | ~987ms | ~407ms | 2.4Γ faster |
| Hot download | ~2.5 GB | ~809 MB | 3Γ smaller |
Deep-link cases (Codex threads, Gmail compose/tabs, Calendar views) pass via the transcript compiler β no v3 lazy load required.
Architecture
spoken command
β FunctionGemma IR (~270M + 5MB adapter)
β compact JSON intent
β transcript-aware compiler (calendar views, Gmail tabs, native app workflows)
β DesktopAutomationPlan JSON
spoken dictation / intent
β LFM2.5 writer (~1.2B + 42MB adapter)
β polished text or intent label
Quick start
pip install mlx-lm huggingface_hub gemmaflow-tune
from huggingface_hub import snapshot_download
from mlx_lm import load, generate
from gemmaflow_tune.compact_ir import compact_ir_prompt, parse_compact_ir, repair_compact_ir, compile_compact_ir
bundle = snapshot_download("nsalerni/flowcast-v3-lite")
writer_base = snapshot_download("mlx-community/LFM2.5-1.2B-Instruct-4bit")
ir_base = snapshot_download("mlx-community/functiongemma-270m-it-4bit")
writer_model, writer_tok = load(writer_base, adapter_path=f"{bundle}/writer")
ir_model, ir_tok = load(ir_base, adapter_path=f"{bundle}/ir")
# Automation: IR β compiler
transcript = "open codex and create a new thread in my loudink project"
ir_prompt = compact_ir_prompt(transcript)
ir_out = generate(ir_model, ir_tok, prompt=ir_prompt, max_tokens=96)
ir = repair_compact_ir(parse_compact_ir(ir_out), transcript)
plan = compile_compact_ir(ir, transcript=transcript)
print(plan)
# Dictation: writer
# (use writer_model with dictation prompt β see gemmaflow-tune prompts)
Files
| File | Description |
|---|---|
writer/adapters.safetensors |
LFM2.5 writer LoRA (checkpoint 0000006) |
writer/adapter_config.json |
Writer LoRA config |
ir/adapters.safetensors |
FunctionGemma compact-IR LoRA |
ir/adapter_config.json |
IR LoRA config |
inference_config.json |
Runtime settings + benchmark scores |
manifest.json |
Production manifest for GemmaFlow integration |
Recommended inference settings
{
"runner_kind": "compact_ir",
"prompt_mode": "verbose",
"json_early_stop": true,
"temperature": 0.0,
"top_p": 1.0,
"ir_max_tokens": 96,
"dictation_max_tokens": 192
}
Training lineage
Fine-tuned with gemmaflow-tune:
- Writer base:
mlx-community/LFM2.5-1.2B-Instruct-4bit - IR base:
mlx-community/functiongemma-270m-it-4bit - Method: LoRA on both models + deterministic transcript compiler
- Predecessor:
nsalerni/gemma-4-e2b-flowcast-v3(teacher + benchmark reference)
Citation
@misc{flowcastv3lite2026,
title = {flowcast-v3-lite: Sub-1GB Voice Desktop Automation for GemmaFlow},
author = {Salerni, Nicola},
year = {2026},
url = {https://huggingface.co/nsalerni/flowcast-v3-lite}
}
License
Apache 2.0. Base models subject to their respective licenses (LFM2.5, FunctionGemma/Gemma).
Quantized
Model tree for nsalerni/flowcast-v3-lite
Base model
LiquidAI/LFM2.5-1.2B-Base