Transformers
Safetensors
llama
speculative-decoding
eagle3
draft-model
kimi-k2.5
fp8
amd-quark
quantized
no-lm-head-quantization
text-generation-inference
quark
Instructions to use amd/Kimi-K2.5-Eagle3-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use amd/Kimi-K2.5-Eagle3-FP8 with Transformers:
# Load model directly from transformers import AutoTokenizer, LlamaForCausalLMEagle3 tokenizer = AutoTokenizer.from_pretrained("amd/Kimi-K2.5-Eagle3-FP8") model = LlamaForCausalLMEagle3.from_pretrained("amd/Kimi-K2.5-Eagle3-FP8") - Notebooks
- Google Colab
- Kaggle
Capitalize model name, simplify Quark version, and add shared auxiliary files
#6
by larryli2 - opened
Model card edits:
- Capitalize the model id kimi-k2.5-eagle3-fp8 -> Kimi-K2.5-Eagle3-FP8 (7 occurrences; the base model lightseekorg/kimi-k2.5-eagle3 is left as-is).
- Shorten the Quark version 0.12+5bd6865d5ca -> v0.12 in the Model Optimizer line, the Quark version bullet, and the environment table.
Auxiliary files copied from amd/Kimi-K2.5-MXFP4 (files not already present in this repo; model weight shards excluded):
chat_template.jinja, configuration_deepseek.py, configuration_kimi_k25.py, docs/deploy_guidance.md, figures/demo_video.mp4, figures/kimi-logo.png, generation_config.json, kimi_k25_processor.py, kimi_k25_vision_processing.py, media_utils.py, modeling_deepseek.py, modeling_kimi_k25.py, preprocessor_config.json, tiktoken.model, tokenization_kimi.py, tokenizer_config.json, tool_declaration_ts.py.
chaoli-amd changed pull request status to merged