will there be an FP8 version later?

#6
by ZealotTt - opened

This is a very good project, thank you for your efforts!
will there be a distilled FP8 version model in the later stage?

The model can be loaded in FP8 or int8 using quanto, as mentioned in the inference section of README.md. There is a Q5 quantized community model. I personally won't be doing GGUF versions, but if someone else wants to convert them I would welcome it.

Sign up or log in to comment