What would be the average inference time for this model using beam width =4

#31

by ashwin26 - opened Apr 9, 2024

Apr 9, 2024

I am using this model on a database schema(average 20 tables, 30 columns for each table). Currently running on a 1x4090 GPU with 128GB ram. It is taking a long time (7-10 mins). This is how I am loading the model and inferencing
Any suggestions on how I can improve speed?

jp-defog

Apr 10, 2024

@ashwin26 thank you for testing out our model. You may try https://blog.vllm.ai/2023/06/20/vllm.html or https://huggingface.co/docs/text-generation-inference/en/index for optimized serving

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment