Speed up inference
#15
by
mr96
- opened
Is there any way I can speed up inference even further?
ExLlama is currently the fastest GPTQ inference available - if you're not already using that, try it. It's now supported in text-generation-webui.
I'm currently using AutoGPTQ.
When using ExLlama do I just need to download the files and use ExLlama api to load the model?
There's some docs here: https://github.com/oobabooga/text-generation-webui/blob/main/docs/ExLlama.md