Sagemaker deployment config for sub second real time inference

#9
by vibranium - opened

I am trying to deploy this model on the A100 p4d sagemaker endpoint instance. What are the configs for TGI that can be used for sub second real time inference? Thanks!

Sign up or log in to comment