gemma-4-e2b-flowcast-v3 ยท flowcast-sota-v3

Flowcast v3 is the production LoRA fine-tune of Gemma 4 E2B Text-int4 for macOS voice-agent desktop automation. It supersedes flowcast-sota-v1 with improved expanded-benchmark coverage while maintaining 100% on the core production gate.

Say it. Plan it. Do it.

What changed in v3

Surgical refine on expanded-benchmark failures (browser commands, OOD retail/dev, paraphrases). Training resumes from gen-refine adapter with targeted repair examples โ€” not a full retrain.

Gate v1 v3 ฮ”
Core hard eval (117) 100% 100% โ€”
Core held-out (27) 100% 100% โ€”
Expanded hard quality (170) 98.2% 99.4% +1.2%
Expanded held-out (39) 97.4% 100% +2.6%
Generalization suite 91.4% 97.1% +5.7%
Core p50 latency ~1028ms ~1002ms ~same

Hard quality = task accuracy excluding latency SLA. v3 is the only variant that passes both production gates (100% core, โ‰ฅ99% expanded quality).

Quick start (MLX, Apple Silicon)

pip install mlx-lm huggingface_hub
from huggingface_hub import snapshot_download
from mlx_lm import load, generate

base = snapshot_download("mlx-community/Gemma4-E2B-IT-Text-int4")
adapter = snapshot_download("nsalerni/gemma-4-e2b-flowcast-v3")

model, tokenizer = load(base, adapter_path=adapter)

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Classify: go to gmail"}],
    tokenize=False,
    add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128))

GemmaFlow / flowcast integration

from huggingface_hub import snapshot_download
from gemmaflow_tune.production import create_production_runner
from gemmaflow_tune.prompts import build_automation_planning_for_case

adapter = snapshot_download("nsalerni/gemma-4-e2b-flowcast-v3")
runner = create_production_runner(adapter_path=adapter)

case = {"transcript": "go to gmail", "tags": ["web_routing"]}
prompt = build_automation_planning_for_case(case, "hybrid_slim")
result = runner.generate("", user_content=prompt, max_tokens=256)
print(result.text)

Files

File Description
adapters.safetensors LoRA weights (checkpoint 0000030)
adapter_config.json LoRA config + base model reference
inference_config.json Recommended runtime settings + benchmark scores

Recommended inference settings

{
  "prompt_mode": "hybrid_slim",
  "json_early_stop": true,
  "use_prompt_kv_cache": true,
  "temperature": 0.02,
  "top_p": 0.85,
  "max_tokens": 256
}

Training lineage

Fine-tuned with gemmaflow-tune:

  • Base: mlx-community/Gemma4-E2B-IT-Text-int4 (0.7B active, ~2.5 GB disk)
  • Method: LoRA (rank 16, 16 layers, attn projections)
  • Pipeline: v1 SFT โ†’ v2 gen-refine โ†’ v3 expanded surgical refine (30 iters)
  • Predecessor adapter: flowcast-sota-v1 โ†’ flowcast-v2-gen-refine

Citation

@misc{gemma4e2bflowcastv32026,
  title  = {gemma-4-e2b-flowcast-v3: Voice Desktop Automation for GemmaFlow},
  author = {Salerni, Nicola},
  year   = {2026},
  url    = {https://huggingface.co/nsalerni/gemma-4-e2b-flowcast-v3}
}

License

Apache 2.0. Base model subject to Google Gemma license.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nsalerni/gemma-4-e2b-flowcast-v3

Adapter
(6)
this model