amd
/

Kimi-K2.5-Eagle3-FP8

speculative-decoding

no-lm-head-quantization

text-generation-inference

Model card Files Files and versions

Restructure model card to AMD MXFP4 template layout

#4

by larryli2 - opened 2 days ago

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

Reorganize the model card to follow the AMD Quark MXFP4 model-card template layout (as used by amd/Qwen3.5-*-MXFP4), keeping H2 (##) headings:

Model Overview: architecture, input/output, Model Optimizer (AMD-Quark v0.12+5bd6865d5ca), weight/activation quantization.
Model Quantization: quantization details (incl. Quark version), quantization scripts, plus Quantization Environment / vLLM Loading Note / Quantized Layers / Layers Not Quantized / Tensor Dtype Overview.
Evaluation: Throughput results table and the vLLM reproduction recipe (Docker images, serving, benchmarking).
Intended Use, Citation, License retained.
All existing content is preserved (no deletions); this is a layout/structure change with the Quark version now stated explicitly.

Restructure model card to AMD MXFP4 template layout0170ce5b

chaoli-amd changed pull request status to merged 2 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment