Speed up inference

#15

by mr96 - opened Jun 22, 2023

mr96

Jun 22, 2023

Is there any way I can speed up inference even further?

Owner Jun 22, 2023

ExLlama is currently the fastest GPTQ inference available - if you're not already using that, try it. It's now supported in text-generation-webui.

mr96

Jun 22, 2023

•

I'm currently using AutoGPTQ.

When using ExLlama do I just need to download the files and use ExLlama api to load the model?

Owner Jun 22, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment