How much RAM is needed in AWS Dedicated Inference Endpoint Deployment?
How much RAM is needed in AWS Dedicated Inference Endpoint Deployment?
I want to deploy this model with dedicated Inference Endpoint in AWS, how i will decide just for Endpoint how much RAM is needed
since its just a Endpoint, will 4 GB RAM will do the job?
My usage is close to 5000 request per day max and may be just 3-4 users.
How much RAM is needed in AWS Dedicated Inference Endpoint Deployment?
I want to deploy this model with dedicated Inference Endpoint in AWS, how i will decide just for Endpoint how much RAM is needed
since its just a Endpoint, will 4 GB RAM will do the job?My usage is close to 5000 request per day max and may be just 3-4 users.
If im understanding this question properly, then you might need to look into how inference works in a more general way. For inference of a model like Mixtral on full precision you would require 100GB of VRAM (GPU), while Im not an expert on AWS deployments, maybe this will help you a bit!
Hi
@pandora-s
, my question is from this service
https://huggingface.co/pricing#endpoints
There is a explainer video from HF here : https://www.youtube.com/watch?v=ZQPm2-uR9zA
Hi @pandora-s , my question is from this service
https://huggingface.co/pricing#endpointsThere is a explainer video from HF here : https://www.youtube.com/watch?v=ZQPm2-uR9zA
Hi, and yeah you would need the required hardware for inference AKA run the model.
To just "run" the model itself you would need the amount of GPU I mentionned for full precision, but for that amount of people and amount of requests, cant you use Mistrals API service? Would be considerably cheaper I believe than getting the hardware you need.
Hi @pandora-s ,
Model ( hosted by HF or Mistral) (step1) ----> API Inference Endpoint Hosted by me in Amazon or Google Cloud ( step2) --- API Output used by my application( step 3)
I am looking for step2 and pay only for step 1 and control the API usage according to my need and shutdown the step 2 machine as when not in use. Step 2 usage and hardware will be in my control