Inference speed is slow
#11
by
kiran2405
- opened
I am trying to load this model for inference on a databricks notebook using the code provided in model card. But I am having a very low inference speed. It takes around 40-50 seconds to provide the answer even for simple prompts. How can I speedup this inference time?
My databricks cuda version is 11.4 .
Installing directly from github instead of pip install auto-gptq solved the problem with inference speed.
kiran2405
changed discussion status to
closed