gemma-4-31B-it EAGLE-3 Draft (Korean-optimized)
An EAGLE-3 draft (speculator) for accelerating Korean generation from
gemma-4-31B-it via speculative decoding. Publicly available drafts are trained
on English and accept Korean tokens poorly, so this draft was retrained on Korean
prompts with on-policy responses regenerated by the verifier itself.
- Method: EAGLE-3 (
vllm-project/speculators) - Verifier (target):
BCCard/gemma-4-31B-it-FP8-Dynamic(FP8; used for serving and hidden-state extraction) - Reused weights: BF16 embed/lm_head from
google/gemma-4-31B-it(standard EAGLE-3) - Warm start:
RedHatAI/gemma-4-31B-it-speculator.eagle3 - Training data: ~150k prompts sampled from
sh2orc/bccard-maywell-jojo0217-markai-lcw99-kendamarron-microsoft
(1.71M-row Korean and English QA;
instructioncolumn only). Answers are discarded and regenerated on-policy by the verifier. - Sequence length: 8192
Serving (vLLM)
VLLM_USE_FLASHINFER_SAMPLER=0 vllm serve BCCard/gemma-4-31B-it-FP8-Dynamic -tp 1 \
--max-model-len 8192 \
--speculative-config '{
"model": "BCCard/MoAI-gemma-4-31B-it-speculator.eagle3",
"num_speculative_tokens": 4,
"method": "eagle3",
"draft_tensor_parallel_size": 1
}'
Tune num_speculative_tokens in the 4–8 range based on measured acceptance / TPS.
The draft uses the verifier's tokenizer.
Performance (per-position acceptance at training time, validation)
| position | full_acc | cond_acc |
|---|---|---|
| 0 | 0.638 | 0.638 |
| 1 | 0.380 | 0.595 |
| 2 | 0.235 | 0.618 |
Mean accepted length ≈ 2.3 tokens/step, roughly ~2.2x speedup (measure on your own traffic). Train and validation metrics match closely, so there is no overfitting.
Limitations
- Trained on general Korean QA. Domain-specific traffic (e.g. finance) may benefit from one more training cycle on domain-matched data, raising acceptance.
- Acceptance is measured against the verifier
BCCard/gemma-4-31B-it-FP8-Dynamic. Pairing the draft with a different target will change results.
License
Apache 2.0. The base Gemma 4 (Apache 2.0 since 2026-04, the first Gemma family
to adopt it), the verifier BCCard/gemma-4-31B-it-FP8-Dynamic (Apache 2.0), and the
RedHat EAGLE-3 warm-start checkpoint are all Apache 2.0, so this draft is released
under Apache 2.0 as well. Apache 2.0 requires only attribution of the original
copyright and disclosure of modifications, with no restrictions on commercial use,
modification, or redistribution. (This is informational, not legal advice.)
- Downloads last month
- 33
Model tree for BCCard/MoAI-gemma-4-31B-it-speculator.eagle3
Base model
google/gemma-4-31B