What server configuration is required to run this model?

#2
by hongdouzi - opened

I have max_sql_length=5000, but I still experience memory overflow when inputting text of the corresponding length.

My memory is about 32GB, and the graphics memory is approximately 24GB.

Do you use the model in inference mode or with torch.no_grad() decorator when inferencing?

You want to set it up with the Endpoint's Available on the hugging space hub.. And but they do charge whenever you have it running!

You might be able to get away with it but honestly unless your running super fast gpus then your better off running with endpoints or google Collab or run pod!

Sign up or log in to comment