You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
This is a research preview draft model. Access is granted manually. By requesting access you agree to use it for research/evaluation and understand it is an early, single-config checkpoint.
Log in or Sign Up to review the conditions and access this model content.
Kimi-K2.7-coder-DFLASH-preview
A DFlash (block-diffusion) speculative-decoding draft model for an ablated Kimi-K2.7-Code target. Research preview.
- Trained with SpecForge (online DFlash training).
- Trained on only ~55k samples (single corpus, a few epochs) โ deliberately small; this is a preview.
- On our ablated Kimi-K2.7-Code target it already beats a misaligned EAGLE3 drafter (a K2.6 EAGLE3 head used on K2.7-Code): measured mean accept length ~2.5 vs ~1.9, ~1.3x single-stream decode speedup, with peaks of 4.6-5.2 on long free-form code bodies.
- Architecture:
DFlashDraftModel(5 layers, hidden 7168), train block size 16 / infer 8, target layer ids capture, mask token id 163838.
Serving (sglang)
Serve against the matching target with DFLASH speculative decoding:
python -m sglang.launch_server --model-path <kimi-k2.7-code-target> --tp 4 \
--speculative-algorithm DFLASH \
--speculative-draft-model-path Kimi-K2.7-coder-DFLASH-preview \
--speculative-eagle-topk 1 --speculative-dflash-block-size 8 --speculative-num-draft-tokens 8
Production / structured outputs
Stock sglang DFLASH rejects grammar-constrained requests (JSON schema / regex / tool schemas). Support for that โ so this drafter keeps its speedup on coding-CLI JSON (thin envelope around a large code/diff string) โ is proposed upstream in sgl-project/sglang PR #28943.
Limitations
- Early preview (small data, single config); acceptance plateaus and is not yet tuned.
- Must be paired with the matching ablated Kimi-K2.7-Code target; not a standalone model.
- Downloads last month
- -