Speed up inference #15
by
- opened
Is there any way I can speed up inference even further?
ExLlama is currently the fastest GPTQ inference available - if you're not already using that, try it. It's now supported in text-generation-webui.
I'm currently using AutoGPTQ.
When using ExLlama do I just need to download the files and use ExLlama api to load the model?
There's some docs here: https://github.com/oobabooga/text-generation-webui/blob/main/docs/ExLlama.md