How to fast inference with FP8
#2
by
CCRss
- opened
I wonder if there is an easy or not easy way to inference faster using FP8.
vLLM has native support for these FP8 checkpoints! https://docs.vllm.ai/en/latest/quantization/fp8.html