poolside/Laguna-XS-2.1-DFlash

DFlash speculator for poolside/Laguna-XS-2.1.

Model Specifications

Base Model poolside/Laguna-XS-2.1
Draft Architecture 5-layer Llama-style draft model (DFlashLagunaForCausalLM)
Hidden Size 2048
Attention Heads 64 query / 8 key-value, head_dim 128
Sliding Window 512
Gating Per-head
Proposals Up to 15 tokens per step
Chat Template poolside/Laguna-XS-2.1 (use /chat/completions endpoint)
Format Safetensors (BF16)
License OpenMDW-1.1

Deployment

DFlash speculative-decoding support for Laguna XS 2.1 has not yet landed upstream. Integrations are in progress:

See the "Speculative decoding (DFlash)" notes on the Laguna XS 2.1 model card for current status.

Once vLLM support lands, pair this speculator with the base model by adding a --speculative-config block to the standard Laguna XS 2.1 serve command:

# Support in progress — see vllm-project/vllm#46853
vllm serve poolside/Laguna-XS-2.1 \
    --tool-call-parser poolside_v1 \
    --reasoning-parser poolside_v1 \
    --enable-auto-tool-choice \
    --default-chat-template-kwargs '{"enable_thinking": true}' \
    --speculative-config '{"model": "poolside/Laguna-XS-2.1-DFlash", "num_speculative_tokens": 15, "method": "dflash"}'

SGLang support:

sglang serve \
    --model-path poolside/Laguna-XS-2.1 \
    --speculative-algorithm DFLASH \
    --speculative-draft-model-path poolside/Laguna-XS-2.1-DFlash

Evaluation

Speculative-decoding throughput and mean acceptance length for the (BF16) Laguna XS 2.1 target paired with this speculator (num_speculative_tokens = 15), versus the same model without speculative decoding:

Dataset Baseline (tok/s/seq) DFlash (tok/s/seq) Speedup Acceptance length
GSM8K v2 80.22 133.78 ×1.67 3.55
HumanEval 95.57 252.59 ×2.64 4.57
EvalPlus 97.68 192.69 ×1.97 4.10
Math 73.63 139.84 ×1.90 4.40

License

This model is licensed under the OpenMDW-1.1 License.

Intended and Responsible Use

Laguna-XS-2.1-DFlash is designed for software engineering and agentic coding use cases, and you are responsible for confirming that it is appropriate for your intended application. Laguna-XS-2.1-DFlash is subject to the OpenMDW-1.1 License, and should be used consistently with Poolside's Acceptable Use Policy.

Please report security vulnerabilities or safety concerns to security@poolside.ai.

References

Downloads last month
54
Safetensors
Model size
0.5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for poolside/Laguna-XS-2.1-DFlash

Finetuned
(1)
this model

Collection including poolside/Laguna-XS-2.1-DFlash

Paper for poolside/Laguna-XS-2.1-DFlash