Hosting an Inference endpoint

by lukehansen - opened Jul 21, 2023

Discussion

lukehansen

Jul 21, 2023

Hi! Has anybody hosted this model as an inference endpoint? What platform did you use?

henrystarack

Jul 21, 2023

yes, awesome, but better to use small.

lukehansen

Jul 21, 2023

Hi! Can I ask what platform you used?

henrystarack

Jul 21, 2023

we use Azure V100 GPU server.

lukehansen

Jul 21, 2023

Did you load the model into Azure directly from huggingface? (Also thanks for your help!)

henrystarack

Jul 21, 2023

yes, but it's quite compliated to be able run on Azure server. you can also try Genesis Cloud.

silvacarl

Jul 21, 2023

•

edited Oct 10, 2023

AWS EC2 g4dn.xlarge works great and is about 50 cents an hour. Make sure you give it a decent swap sapce when you set it up, AWS EC2 by default does not configure swap.

RR2021

Aug 1, 2023

RTX 3090 on vast.ai costs 0.20 cent / hour. Inference speed on large-v2 is 4x realtime...

TusharGoel

Oct 10, 2023

try to load it on nvidia triton inference server and then on k8s on any cloud. It is fast and scalable

pv-anand

Jun 16, 2024

This comment has been hidden

pv-anand

Jun 16, 2024

AWS EC2 g4dn.xlarge works great and is about 50 cents an hour. Make sure you give it a decent swap sapce when you set it up, AWS EC2 by default does not configure swap.

can you let me know steps to folow to set it up on g4dn instance.

Thanks.

silvacarl

Jun 17, 2024

just like you would any other EC2. i recommend using Ubuntu 20.04 and whoser v2, not v3

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment