Inference Speed Compared to ExLlama?

#6
by larsskaug - opened

In my testing, inference is five times slower with this AWQ model than with ExLlama (TheBloke_Mistral-7B-Instruct-v0.1-GPTQ_gptq-4bit-32g-actorder_True)

Is that to be expected?

Sign up or log in to comment