Instructions to use Mediform/gemma4-e4b-v13-assistant-rollout-mlx-bf16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Mediform/gemma4-e4b-v13-assistant-rollout-mlx-bf16 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir gemma4-e4b-v13-assistant-rollout-mlx-bf16 Mediform/gemma4-e4b-v13-assistant-rollout-mlx-bf16
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
gemma4-e4b-v13-assistant-rollout-mlx-bf16
MLX-swift bf16 Gemma-4 MTP draft assistant, rollout-distilled for Scribion's German
medical fact-extraction target (gemma4-e4b-v13-plainlora-r16). Drop-in for the Scribion
Gemma4MTPTokenIterator — identical key set / shapes / dtype to
mlx-community/gemma-4-E4B-it-assistant-bf16, only the weights differ.
What changed vs the stock assistant
EAGLE-style multi-step rollout distillation against the finetuned v13 target, on in-domain
extraction data (biased toward long dialogue transcripts). The assistant's own post_projection
feature is rolled through k=6 draft steps (tokens teacher-forced), trained to match the target's
next-token predictions — which lifts deep-draft acceptance (the regime where the stock
assistant falls off on long, less-predictable dialogues).
Speculative decoding is exact, so output is identical to the target's greedy decode — this is a pure decode-speed change with no quality effect.
Acceptance (transformers reference engine, fixed draft length k, greedy)
froehlich-krause (long-dialogue clip), accepted tokens per target step:
| k | stock | this model |
|---|---|---|
| 5 | 3.37 | 3.56 |
| 7 | 3.76 | 4.27 (+13.6%) |
| 9 | 3.88 | 4.34 (+11.9%) |
Trades a little shallow-draft (k=3) acceptance for the deep-draft gain. arztbericht (already near-optimal) is ~flat. Accept-rate transfers to mlx-swift; wall-clock speedup is device-dependent.
- Downloads last month
- 80
Quantized
Model tree for Mediform/gemma4-e4b-v13-assistant-rollout-mlx-bf16
Base model
google/gemma-4-E4B-it-assistant