model taking too long for inference using num_beams=4
#30
by
ashwin26
- opened
I am running the model as in the colab file from the github repo. I have a long database schema(around 12 tables,average 15 columns). It is taking a lot of time to generate query using .generate() function. How can i optimize this?
Hi @ashwin26 , that's expected for .generate. We would recommend using an inference-optimized framework like TGI (https://huggingface.co/docs/text-generation-inference/en/index) or VLLM (https://vllm.readthedocs.io/) to speed it up.
wongjingping
changed discussion status to
closed
ashwin26
changed discussion title from
model taking too long of inference using num_beams=4
to model taking too long for inference using num_beams=4
Hi @wongjingping On an average how long does it actually take if used TGI or vLLM using beam width 4.