Batch inference slower as compared to single inferences

#6
by gauranshsoni12 - opened

Hi I ran the model over a sequence of prompts using the batch_size param of 32, GPU Specs: Nvidia A10G 4x24Gb, even tho Nvidia-smi shows heavy gpu utilization but results are not coming up, rather a loop over the sequence generates faster results

Sign up or log in to comment