Official FP8 quant request

#3
by OnesimusTheLesser - opened

First of all, thanks to the team for the effort and cost involved, and providing this to the community.

Given the size of the F16 model (which is actually smaller than I would have thought for a Kimi-K-derivative), would it be possible to provide an official FP8 quant? e.g. Qwen and Stepfun-AI teams do this (as well as some others). It's really helpful to people who do not have resources to ever run it in F16 size. Also, that would make downloading it and running it in FP8 mode a waste of bandwidth and storage space.

I hope you will consider this request. And thank you again in any case.

It is already in int4, not fp16. Upscaling to fp8 makes no sense.

Oops, I missed that!

OnesimusTheLesser changed discussion status to closed

Sign up or log in to comment