Hosting an Inference endpoint

#6
by lukehansen - opened

Hi! Has anybody hosted this model as an inference endpoint? What platform did you use?

yes, awesome, but better to use small.

Hi! Can I ask what platform you used?

we use Azure V100 GPU server.

Did you load the model into Azure directly from huggingface? (Also thanks for your help!)

yes, but it's quite compliated to be able run on Azure server. you can also try Genesis Cloud.

AWS EC2 g4dn.xlarge works great and is about 50 cents an hour. Make sure you give it a decent swap sapce when you set it up, AWS EC2 by default does not configure swap.

RTX 3090 on vast.ai costs 0.20 cent / hour. Inference speed on large-v2 is 4x realtime...

try to load it on nvidia triton inference server and then on k8s on any cloud. It is fast and scalable

Sign up or log in to comment