amd
/

Kimi-K2.5-Eagle3-FP8

speculative-decoding

no-lm-head-quantization

text-generation-inference

Model card Files Files and versions

Uppercase model name, set Quark version to v0.12, add tokenizer files

#5

by larryli2 - opened 1 day ago

base: refs/heads/main

←

from: refs/pr/5

Discussion Files changed

1 day ago

Three combined changes:

Model card: capitalize the model name to Kimi-K2.5-Eagle3-FP8 (all occurrences).
Model card: shorten the AMD Quark version to v0.12 wherever it appeared (Model Optimizer line, quantization details, environment table).
Add the tokenizer bundle so the documented AutoTokenizer.from_pretrained(..., trust_remote_code=True) works and matches the moonshotai/Kimi-K2.5 target tokenizer used for Eagle3 speculative decoding: tokenizer_config.json, tiktoken.model, tokenization_kimi.py, tool_declaration_ts.py (imported by tokenization_kimi.py), and chat_template.jinja. bos=[BOS] 163584 / eos=[EOS] 163585 match this model's config.json. Verified the tokenizer loads as TikTokenTokenizer and encodes/applies the chat template correctly. Multimodal/MoE/vision modeling files from the target were intentionally not copied (this draft is a text-only LlamaForCausalLMEagle3); the target's generation_config.json was also skipped because its eos_token_id (163586) conflicts with this model's config (163585).

Uppercase model name, set Quark version to v0.12, add tokenizer files00feb66e

larryli2 changed pull request status to closed 1 day ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment