poolside/Laguna-XS-2.1-DFlash

DFlash speculator for poolside/Laguna-XS-2.1.

Model Specifications


Base Model	poolside/Laguna-XS-2.1
Draft Architecture	5-layer Llama-style draft model (`DFlashLagunaForCausalLM`)
Hidden Size	2048
Attention Heads	64 query / 8 key-value, head_dim 128
Sliding Window	512
Gating	Per-head
Proposals	Up to 15 tokens per step
Chat Template	poolside/Laguna-XS-2.1 (use `/chat/completions` endpoint)
Format	Safetensors (BF16)
License	OpenMDW-1.1

Deployment

DFlash speculative-decoding support for Laguna XS 2.1 has not yet landed upstream. Integrations are in progress:

vLLM — vllm-project/vllm#46853

TRT-LLM — NVIDIA/TensorRT-LLM#15666

See the "Speculative decoding (DFlash)" notes on the Laguna XS 2.1 model card for current status.

Once vLLM support lands, pair this speculator with the base model by adding a --speculative-config block to the standard Laguna XS 2.1 serve command:

# Support in progress — see vllm-project/vllm#46853
vllm serve poolside/Laguna-XS-2.1 \
    --tool-call-parser poolside_v1 \
    --reasoning-parser poolside_v1 \
    --enable-auto-tool-choice \
    --default-chat-template-kwargs '{"enable_thinking": true}' \
    --speculative-config '{"model": "poolside/Laguna-XS-2.1-DFlash", "num_speculative_tokens": 15, "method": "dflash"}'

SGLang support:

sglang serve \
    --model-path poolside/Laguna-XS-2.1 \
    --speculative-algorithm DFLASH \
    --speculative-draft-model-path poolside/Laguna-XS-2.1-DFlash

Evaluation

Speculative-decoding throughput and mean acceptance length for the (BF16) Laguna XS 2.1 target paired with this speculator (num_speculative_tokens = 15), versus the same model without speculative decoding:

Dataset	Baseline (tok/s/seq)	DFlash (tok/s/seq)	Speedup	Acceptance length
GSM8K v2	80.22	133.78	×1.67	3.55
HumanEval	95.57	252.59	×2.64	4.57
EvalPlus	97.68	192.69	×1.97	4.10
Math	73.63	139.84	×1.90	4.40

License

This model is licensed under the OpenMDW-1.1 License.

Intended and Responsible Use

Laguna-XS-2.1-DFlash is designed for software engineering and agentic coding use cases, and you are responsible for confirming that it is appropriate for your intended application. Laguna-XS-2.1-DFlash is subject to the OpenMDW-1.1 License, and should be used consistently with Poolside's Acceptable Use Policy.

Please report security vulnerabilities or safety concerns to security@poolside.ai.