Speed up inference #15

by - opened

Is there any way I can speed up inference even further?

ExLlama is currently the fastest GPTQ inference available - if you're not already using that, try it. It's now supported in text-generation-webui.

I'm currently using AutoGPTQ.

When using ExLlama do I just need to download the files and use ExLlama api to load the model?

Sign up or log in to comment