Transformers
Safetensors
llama
speculative-decoding
eagle3
draft-model
kimi-k2.5
fp8
amd-quark
quantized
no-lm-head-quantization
text-generation-inference
quark
Instructions to use amd/Kimi-K2.5-Eagle3-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use amd/Kimi-K2.5-Eagle3-FP8 with Transformers:
# Load model directly from transformers import AutoTokenizer, LlamaForCausalLMEagle3 tokenizer = AutoTokenizer.from_pretrained("amd/Kimi-K2.5-Eagle3-FP8") model = LlamaForCausalLMEagle3.from_pretrained("amd/Kimi-K2.5-Eagle3-FP8") - Notebooks
- Google Colab
- Kaggle
Restructure model card to AMD MXFP4 template layout
#4
by larryli2 - opened
Reorganize the model card to follow the AMD Quark MXFP4 model-card template layout (as used by amd/Qwen3.5-*-MXFP4), keeping H2 (##) headings:
- Model Overview: architecture, input/output, Model Optimizer (AMD-Quark v0.12+5bd6865d5ca), weight/activation quantization.
- Model Quantization: quantization details (incl. Quark version), quantization scripts, plus Quantization Environment / vLLM Loading Note / Quantized Layers / Layers Not Quantized / Tensor Dtype Overview.
- Evaluation: Throughput results table and the vLLM reproduction recipe (Docker images, serving, benchmarking).
- Intended Use, Citation, License retained.
All existing content is preserved (no deletions); this is a layout/structure change with the Quark version now stated explicitly.
chaoli-amd changed pull request status to merged