Add vLLM reproduction guide and Eagle3 speculative-decoding results

#3
by larryli2 - opened

Add two sections to the model card so others can reproduce the numbers:

  • 'Reproduction': vLLM EAGLE3 speculative-decoding recipe on AMD Instinct MI355X (Docker images, ROCm/AITER env vars, vllm serve with --speculative-config, and a vllm bench serve throughput sweep). Target amd/Kimi-K2.5-MXFP4, FP8 draft amd/kimi-k2.5-eagle3-fp8, TP=4, ISL/OSL=1K/1K.
  • 'Results': no-spec vs BF16 Eagle3 vs FP8 Eagle3 tok/s/GPU at concurrency 4/8/16/32/64.
    Addition only; existing sections are unchanged.
chaoli-amd changed pull request status to merged

Sign up or log in to comment