Request to create one Q8_0 version with --leave-output-tensor

#1
by mechanicmuthu - opened

Hi, I have a request. If you can upload a Q8_0 version with LOT option (--leave-output-tensor, the highest quality Quantized version) so that we can do the
./quantize --allow-requantize model-q8_0-LOT.gguf Q4_0 or any other type ourselves without loss of quality (I guess).

For me, on the one hand, downloading the F16 / F32 .pth version and converting to gguf is too big, and, on the other, I want to try out multiple quantized versions so that I can test out their speeds and quality WITHOUT downloading multiple files large files.

You can provide the quantize script in readme. Just suggesting. Thanks.

Sign up or log in to comment