gemma-4-e2b-flowcast · flowcast-sota-v1

Flowcast is a production LoRA fine-tune of Gemma 4 E2B for macOS voice-agent desktop automation. It powers spoken-command planning, intent routing, and dictation polish in GemmaFlow.

Say it. Plan it. Do it.

What it does

Task Example Output
Desktop automation "go to gmail", "launch claude and refactor this function" JSON DesktopAutomationPlan
Web vs native routing Gmail → browser; Slack → native app Platform-aware step lists
Intent classification Dictation vs automation vs editing {intent, confidence, reason}
Dictation polish "um can you send me the report…" Clean prose

Benchmarks (Phase 3 production gate)

Metric Score
Hard eval quality 100%
Hard eval overall (incl. latency SLA) 100%
Held-out OOD 100%
Median latency (automation, KV cache) ~1.05s

Inference stack: hybrid_slim prompts · JSON early-stop · MLX prefix KV cache.

Quick start (MLX, Apple Silicon)

pip install mlx-lm huggingface_hub
from huggingface_hub import snapshot_download
from mlx_lm import load, generate

base = snapshot_download("mlx-community/Gemma4-E2B-IT-Text-int4")
adapter = snapshot_download("nsalerni/gemma-4-e2b-flowcast")

model, tokenizer = load(base, adapter_path=adapter)

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Classify: go to gmail"}],
    tokenize=False,
    add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128))

GemmaFlow / flowcast integration

from huggingface_hub import snapshot_download
from gemmaflow_tune import create_production_runner
from gemmaflow_tune.prompts import build_automation_planning_for_case

# Point runner at the published adapter (override local manifest path)
adapter = snapshot_download("nsalerni/gemma-4-e2b-flowcast")
runner = create_production_runner(adapter_path=adapter)

case = {"transcript": "go to gmail", "tags": ["web_routing"]}
prompt = build_automation_planning_for_case(case, "hybrid_slim")
result = runner.generate("", user_content=prompt, max_tokens=256)
print(result.text)

Note: create_production_runner(adapter_path=...) requires gemmaflow-tune installed from the training repo. For standalone MLX usage, use mlx_lm.load as shown above.

Files

File Description
adapters.safetensors LoRA weights (checkpoint 0000012)
adapter_config.json LoRA config + base model reference
inference_config.json Recommended runtime settings

Recommended inference settings

{
  "prompt_mode": "hybrid_slim",
  "json_early_stop": true,
  "use_prompt_kv_cache": true,
  "temperature": 0.02,
  "top_p": 0.85,
  "max_tokens": 256
}

Training

Fine-tuned with gemmaflow-tune:

  • Base: mlx-community/Gemma4-E2B-IT-Text-int4
  • Method: LoRA (rank 16, 16 layers, attn projections)
  • Pipeline: SFT surgical null-plan refinement → Phase 2/3 prompt engineering
  • Not included: fast-path retrain (hybrid_slim_fast, ~89% — rejected)

Citation

@misc{gemma4e2bflowcast2026,
  title  = {gemma-4-e2b-flowcast: Voice Desktop Automation for GemmaFlow},
  author = {Salerni, Nicola},
  year   = {2026},
  url    = {https://huggingface.co/nsalerni/gemma-4-e2b-flowcast}
}

License

Apache 2.0. Base model subject to Google Gemma license.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nsalerni/gemma-4-e2b-flowcast

Adapter
(6)
this model