Batch inference slower as compared to single inferences

by gauranshsoni12 - opened Jan 24

Jan 24

Hi I ran the model over a sequence of prompts using the batch_size param of 32, GPU Specs: Nvidia A10G 4x24Gb, even tho Nvidia-smi shows heavy gpu utilization but results are not coming up, rather a loop over the sequence generates faster results

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment