Text Generation
MLX
mlx-lm
gemma
gemma-4
lora
voice-agent
desktop-automation
computer-use
flowcast
gemmaflow
apple-silicon
Instructions to use nsalerni/gemma-4-e2b-flowcast with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use nsalerni/gemma-4-e2b-flowcast with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("nsalerni/gemma-4-e2b-flowcast") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use nsalerni/gemma-4-e2b-flowcast with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "nsalerni/gemma-4-e2b-flowcast" --prompt "Once upon a time"
gemma-4-e2b-flowcast · flowcast-sota-v1
Flowcast is a production LoRA fine-tune of Gemma 4 E2B for macOS voice-agent desktop automation. It powers spoken-command planning, intent routing, and dictation polish in GemmaFlow.
Say it. Plan it. Do it.
What it does
| Task | Example | Output |
|---|---|---|
| Desktop automation | "go to gmail", "launch claude and refactor this function" | JSON DesktopAutomationPlan |
| Web vs native routing | Gmail → browser; Slack → native app | Platform-aware step lists |
| Intent classification | Dictation vs automation vs editing | {intent, confidence, reason} |
| Dictation polish | "um can you send me the report…" | Clean prose |
Benchmarks (Phase 3 production gate)
| Metric | Score |
|---|---|
| Hard eval quality | 100% |
| Hard eval overall (incl. latency SLA) | 100% |
| Held-out OOD | 100% |
| Median latency (automation, KV cache) | ~1.05s |
Inference stack: hybrid_slim prompts · JSON early-stop · MLX prefix KV cache.
Quick start (MLX, Apple Silicon)
pip install mlx-lm huggingface_hub
from huggingface_hub import snapshot_download
from mlx_lm import load, generate
base = snapshot_download("mlx-community/Gemma4-E2B-IT-Text-int4")
adapter = snapshot_download("nsalerni/gemma-4-e2b-flowcast")
model, tokenizer = load(base, adapter_path=adapter)
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "Classify: go to gmail"}],
tokenize=False,
add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128))
GemmaFlow / flowcast integration
from huggingface_hub import snapshot_download
from gemmaflow_tune import create_production_runner
from gemmaflow_tune.prompts import build_automation_planning_for_case
# Point runner at the published adapter (override local manifest path)
adapter = snapshot_download("nsalerni/gemma-4-e2b-flowcast")
runner = create_production_runner(adapter_path=adapter)
case = {"transcript": "go to gmail", "tags": ["web_routing"]}
prompt = build_automation_planning_for_case(case, "hybrid_slim")
result = runner.generate("", user_content=prompt, max_tokens=256)
print(result.text)
Note:
create_production_runner(adapter_path=...)requiresgemmaflow-tuneinstalled from the training repo. For standalone MLX usage, usemlx_lm.loadas shown above.
Files
| File | Description |
|---|---|
adapters.safetensors |
LoRA weights (checkpoint 0000012) |
adapter_config.json |
LoRA config + base model reference |
inference_config.json |
Recommended runtime settings |
Recommended inference settings
{
"prompt_mode": "hybrid_slim",
"json_early_stop": true,
"use_prompt_kv_cache": true,
"temperature": 0.02,
"top_p": 0.85,
"max_tokens": 256
}
Training
Fine-tuned with gemmaflow-tune:
- Base:
mlx-community/Gemma4-E2B-IT-Text-int4 - Method: LoRA (rank 16, 16 layers, attn projections)
- Pipeline: SFT surgical null-plan refinement → Phase 2/3 prompt engineering
- Not included: fast-path retrain (
hybrid_slim_fast, ~89% — rejected)
Citation
@misc{gemma4e2bflowcast2026,
title = {gemma-4-e2b-flowcast: Voice Desktop Automation for GemmaFlow},
author = {Salerni, Nicola},
year = {2026},
url = {https://huggingface.co/nsalerni/gemma-4-e2b-flowcast}
}
License
Apache 2.0. Base model subject to Google Gemma license.
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for nsalerni/gemma-4-e2b-flowcast
Base model
google/gemma-4-E2B Finetuned
google/gemma-4-E2B-it Finetuned
mlx-community/Gemma4-E2B-IT-Text-int4