model taking too long for inference using num_beams=4

#30
by ashwin26 - opened

I am running the model as in the colab file from the github repo. I have a long database schema(around 12 tables,average 15 columns). It is taking a lot of time to generate query using .generate() function. How can i optimize this?

Defog.ai org

Hi @ashwin26 , that's expected for .generate. We would recommend using an inference-optimized framework like TGI (https://huggingface.co/docs/text-generation-inference/en/index) or VLLM (https://vllm.readthedocs.io/) to speed it up.

wongjingping changed discussion status to closed
ashwin26 changed discussion title from model taking too long of inference using num_beams=4 to model taking too long for inference using num_beams=4

Hi @wongjingping On an average how long does it actually take if used TGI or vLLM using beam width 4.

Sign up or log in to comment