Text Generation
MLX
mlx-lm
lfm2
functiongemma
lora
voice-agent
desktop-automation
computer-use
flowcast
gemmaflow
apple-silicon
compact-ir
Instructions to use nsalerni/flowcast-v4-lite with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use nsalerni/flowcast-v4-lite with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("nsalerni/flowcast-v4-lite") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use nsalerni/flowcast-v4-lite with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "nsalerni/flowcast-v4-lite" --prompt "Once upon a time"
flowcast-v4-lite Β· sub-1GB voice agent stack
Flowcast v4-lite is an 809 MB hot-path stack for macOS voice agents. It builds on flowcast-v3-lite with a promoted LFM2.5 writer, IR v5 push5 planner, transcript-first automation repairs, and intent fast-path routing.
100% benchmark accuracy. Faster dictation. IR v5 fallback when repairs miss.
v1.0.1
- Microphone settings fix:
"open microphone settings"now resolves tosystem_setting/microphone(retrained IR v5; waslaunch_app/Mikrofonwhen the wrong chat template was applied). - Chat template routing: use
ir_chat_template_family: functiongemmafor the IR model andwriter_chat_template_family: lfm25for the writer (seeinference_config.json).
Benchmarks (vs v3-lite)
| Gate | v3-lite | v4-lite | Ξ |
|---|---|---|---|
| Core overall | 100% | 100% | tie |
| Expanded overall | 100% | 100% | tie |
| Held-out overall | 100% | 100% | tie |
| Dictation p50 | 360 ms | 208 ms | -152 ms |
| Core avg latency | 116 ms | 78 ms | -38 ms |
| IR model path p50 | 793 ms | 453 ms | -340 ms |
| Hot download | ~809 MB | ~809 MB | same |
Production automation resolves via deterministic transcript repairs and intent fast-path (0 IR calls on the hot path). IR v5 is exercised on the fallback model path.
Architecture
spoken command
β transcript-first repairs + intent fast-path (dominant, ~0 ms)
β FunctionGemma IR v5 (~270M + 5MB adapter, fallback)
β compact JSON intent
β transcript-aware compiler
β DesktopAutomationPlan JSON
spoken dictation / intent
β LFM2.5 writer (~1.2B + 42MB adapter, KV-cached prefix)
β polished text or intent label
Quick start
pip install mlx-lm huggingface_hub gemmaflow-tune
from huggingface_hub import snapshot_download
from mlx_lm import load, generate
from gemmaflow_tune.compact_ir import compact_ir_prompt, parse_compact_ir, repair_compact_ir, compile_compact_ir
bundle = snapshot_download("nsalerni/flowcast-v4-lite")
writer_base = snapshot_download("mlx-community/LFM2.5-1.2B-Instruct-4bit")
ir_base = snapshot_download("mlx-community/functiongemma-270m-it-4bit")
writer_model, writer_tok = load(writer_base, adapter_path=f"{bundle}/writer")
ir_model, ir_tok = load(ir_base, adapter_path=f"{bundle}/ir")
# Automation: IR β compiler (fallback path)
transcript = "open codex and create a new thread in my loudink project"
ir_prompt = compact_ir_prompt(transcript)
ir_out = generate(ir_model, ir_tok, prompt=ir_prompt, max_tokens=96)
ir = repair_compact_ir(parse_compact_ir(ir_out), transcript)
plan = compile_compact_ir(ir, transcript=transcript)
print(plan)
Files
| File | Description |
|---|---|
writer/adapters.safetensors |
LFM2.5 writer LoRA (promoted_core_100) |
writer/adapter_config.json |
Writer LoRA config |
ir/adapters.safetensors |
FunctionGemma IR v5 push5 LoRA |
ir/adapter_config.json |
IR LoRA config |
inference_config.json |
Runtime settings + benchmark scores |
manifest.json |
Production manifest for GemmaFlow integration |
Recommended inference settings
{
"runner_kind": "compact_ir",
"prompt_mode": "verbose",
"json_early_stop": true,
"writer_use_prompt_kv_cache": true,
"ir_use_prompt_kv_cache": true,
"transcript_first_ir": true,
"intent_fast_path": true,
"temperature": 0.0,
"top_p": 1.0,
"ir_max_tokens": 96,
"dictation_max_tokens": 192
}
Training lineage
Fine-tuned with gemmaflow-tune:
- Writer base:
mlx-community/LFM2.5-1.2B-Instruct-4bit - IR base:
mlx-community/functiongemma-270m-it-4bit - Method: LoRA on both models + deterministic transcript compiler + push5 IR corpus
- Predecessor:
nsalerni/flowcast-v3-lite
Citation
@misc{flowcastv4lite2026,
title = {flowcast-v4-lite: Sub-1GB Voice Desktop Automation for GemmaFlow},
author = {Salerni, Nicola},
year = {2026},
url = {https://huggingface.co/nsalerni/flowcast-v4-lite}
}
License
Apache 2.0. Base models subject to their respective licenses (LFM2.5, FunctionGemma/Gemma).
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for nsalerni/flowcast-v4-lite
Base model
LiquidAI/LFM2.5-1.2B-Base Finetuned
LiquidAI/LFM2.5-1.2B-Instruct Quantized
mlx-community/LFM2.5-1.2B-Instruct-4bit