gemma-4-e2b-flowcast · `flowcast-sota-v1`

Flowcast is a production LoRA fine-tune of Gemma 4 E2B for macOS voice-agent desktop automation. It powers spoken-command planning, intent routing, and dictation polish in GemmaFlow.

Say it. Plan it. Do it.

What it does

Task	Example	Output
Desktop automation	"go to gmail", "launch claude and refactor this function"	JSON `DesktopAutomationPlan`
Web vs native routing	Gmail → browser; Slack → native app	Platform-aware step lists
Intent classification	Dictation vs automation vs editing	`{intent, confidence, reason}`
Dictation polish	"um can you send me the report…"	Clean prose

Benchmarks (Phase 3 production gate)

Metric	Score
Hard eval quality	100%
Hard eval overall (incl. latency SLA)	100%
Held-out OOD	100%
Median latency (automation, KV cache)	~1.05s

Inference stack: hybrid_slim prompts · JSON early-stop · MLX prefix KV cache.

Quick start (MLX, Apple Silicon)

pip install mlx-lm huggingface_hub

from huggingface_hub import snapshot_download
from mlx_lm import load, generate

base = snapshot_download("mlx-community/Gemma4-E2B-IT-Text-int4")
adapter = snapshot_download("nsalerni/gemma-4-e2b-flowcast")

model, tokenizer = load(base, adapter_path=adapter)

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Classify: go to gmail"}],
    tokenize=False,
    add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128))

GemmaFlow / flowcast integration

from huggingface_hub import snapshot_download
from gemmaflow_tune import create_production_runner
from gemmaflow_tune.prompts import build_automation_planning_for_case

# Point runner at the published adapter (override local manifest path)
adapter = snapshot_download("nsalerni/gemma-4-e2b-flowcast")
runner = create_production_runner(adapter_path=adapter)

case = {"transcript": "go to gmail", "tags": ["web_routing"]}
prompt = build_automation_planning_for_case(case, "hybrid_slim")
result = runner.generate("", user_content=prompt, max_tokens=256)
print(result.text)

Note: create_production_runner(adapter_path=...) requires gemmaflow-tune installed from the training repo. For standalone MLX usage, use mlx_lm.load as shown above.

Files

File	Description
`adapters.safetensors`	LoRA weights (checkpoint `0000012`)
`adapter_config.json`	LoRA config + base model reference
`inference_config.json`	Recommended runtime settings

Recommended inference settings

{
  "prompt_mode": "hybrid_slim",
  "json_early_stop": true,
  "use_prompt_kv_cache": true,
  "temperature": 0.02,
  "top_p": 0.85,
  "max_tokens": 256
}

Training

Fine-tuned with gemmaflow-tune:

Base: mlx-community/Gemma4-E2B-IT-Text-int4
Method: LoRA (rank 16, 16 layers, attn projections)
Pipeline: SFT surgical null-plan refinement → Phase 2/3 prompt engineering
Not included: fast-path retrain (hybrid_slim_fast, ~89% — rejected)

Citation

@misc{gemma4e2bflowcast2026,
  title  = {gemma-4-e2b-flowcast: Voice Desktop Automation for GemmaFlow},
  author = {Salerni, Nicola},
  year   = {2026},
  url    = {https://huggingface.co/nsalerni/gemma-4-e2b-flowcast}
}

License

Apache 2.0. Base model subject to Google Gemma license.

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for nsalerni/gemma-4-e2b-flowcast

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it