Text2Text Generation
Transformers
Safetensors
101 languages
t5
Inference Endpoints
text-generation-inference

Correct machine to deploy the model on AWS Sagemaker

#11
by LorenzoCevolaniAXA - opened

I am trying to deploy the model in an endpoint inside AWS Sagemaker.
I have tried several instances from "ml.g5.4xlarge" with 4 GPUs, which should be the standard way of deploying a 13B model as this one, to the bigger "ml.g5.48xlarge" with 8 GPUS and I always get an error about an OOM in one of the GPUS, is there something I can try to make it work?
Do you have a configuration that is working on your side?

Sign up or log in to comment