allow dynamic batch size

Hi @l-i ,

Thanks for opening the issue. You would always be able to enable dynamic_batch_size while compiling the checkpoint.

According to previous experience, we used to reach better latency with static batch size. But maybe we can add a checkpoint with dynamic batch size @philschmid , WDYT?

Yes, the latest information we have is that its optimized for BS=1, but @l-i you should be able to compile it using: and setting the batch size as you want.

thank you for sharing the details!

I have two follow up questions:

  • if I enable dynamic batch size during compilation but still inference at the same original static batch size, would it still affect the latency?
  • if I compile with a larger static batch size, how much larger my machine needs to be? (I tried with 6 on 24xl, and it seemed to fail)
Just to point out if you are compiling for large batch size and even 24xlarge run oom, you could try with CPU-only instance just for the compilation.

