How much RAM is needed in AWS Dedicated Inference Endpoint Deployment?

#217

by ramda1234786 - opened Jun 12, 2024

Jun 12, 2024

How much RAM is needed in AWS Dedicated Inference Endpoint Deployment?

I want to deploy this model with dedicated Inference Endpoint in AWS, how i will decide just for Endpoint how much RAM is needed
since its just a Endpoint, will 4 GB RAM will do the job?

My usage is close to 5000 request per day max and may be just 3-4 users.

pandora-s

Mistral AI_ org Jun 12, 2024

How much RAM is needed in AWS Dedicated Inference Endpoint Deployment?

I want to deploy this model with dedicated Inference Endpoint in AWS, how i will decide just for Endpoint how much RAM is needed
since its just a Endpoint, will 4 GB RAM will do the job?

My usage is close to 5000 request per day max and may be just 3-4 users.

If im understanding this question properly, then you might need to look into how inference works in a more general way. For inference of a model like Mixtral on full precision you would require 100GB of VRAM (GPU), while Im not an expert on AWS deployments, maybe this will help you a bit!

ramda1234786

Jun 13, 2024

Hi @pandora-s , my question is from this service
https://huggingface.co/pricing#endpoints

There is a explainer video from HF here : https://www.youtube.com/watch?v=ZQPm2-uR9zA

pandora-s

Mistral AI_ org Jun 14, 2024

Hi @pandora-s , my question is from this service
https://huggingface.co/pricing#endpoints

There is a explainer video from HF here : https://www.youtube.com/watch?v=ZQPm2-uR9zA

Hi, and yeah you would need the required hardware for inference AKA run the model.

pandora-s

Mistral AI_ org Jun 14, 2024

To just "run" the model itself you would need the amount of GPU I mentionned for full precision, but for that amount of people and amount of requests, cant you use Mistrals API service? Would be considerably cheaper I believe than getting the hardware you need.

ramda1234786

Jun 19, 2024

Hi @pandora-s ,

Model ( hosted by HF or Mistral) (step1) ----> API Inference Endpoint Hosted by me in Amazon or Google Cloud ( step2) --- API Output used by my application( step 3)
I am looking for step2 and pay only for step 1 and control the API usage according to my need and shutdown the step 2 machine as when not in use. Step 2 usage and hardware will be in my control

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment