The inference speed is too slow on 4*A10G GPU
As the title mentioned above, the inference speed is about 1 word per second. There is enough GPU RAM and all the weights are loaded on GPU without offload. But the utility of the GPU is only up to about 40%.
I wanna if the inference speed is normal based on my hardware resource.
If not, what's the reason and how can I improve the speed?
If yes, any recommendations for the hardware resources?
Hi. I'm facing the same issue. Do you have any tips that can speed up the model inference? Thanks!
Hi. I'm facing the same issue. Do you have any tips that can speed up the model inference? Thanks!
Following the tutorial in the link will meet your question.
Hi. I'm facing the same issue. Do you have any tips that can speed up the model inference? Thanks!
Following the tutorial in the link will meet your question.
Thank you! Let me take a try.