Inference Endpoints (dedicated) documentation

Pricing

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Pricing

Easily deploy machine learning models on dedicated infrastructure with πŸ€— Inference Endpoints. When you create an Endpoint, you can select the instance type to deploy and scale your model according to an hourly rate. πŸ€— Inference Endpoints is accessible to Hugging Face accounts with an active subscription and credit card on file. At the end of the subscription period, the user or organization account will be charged for the compute resources used while Endpoints are initializing and in a running state.

You can find the hourly pricing for all available instances for πŸ€— Inference Endpoints, and examples of how costs are calculated below. While the prices are shown by the hour, the actual cost is calculated by the minute.

CPU Instances

The table below shows currently available CPU instances and their hourly pricing. If the instance type cannot be selected in the application, you need to request a quota to use it.

Provider Instance Size Hourly rate vCPUs Memory Architecture ID
aws small $0.032 1 2GB Intel Ice Lake aws-c6i-small
aws medium $0.064 2 4GB Intel Ice Lake aws-c6i-medium
aws large $0.128 4 8GB Intel Ice Lake aws-c6i-large
aws xlarge $0.256 8 16GB Intel Ice Lake aws-c6i-xlarge
azure small $0.060 1 2GB Intel Xeon azure-fsv2-small
azure medium $0.120 2 4GB Intel Xeon azure-fsv2-medium
azure large $0.240 4 8GB Intel Xeon azure-fsv2-large
azure xlarge $0.480 8 16GB Intel Xeon azure-fsv2-xlarge
gcp x1 $0.070 1 2GB Intel Xeon gcp-intel-spr-x1
gcp x2 $0.140 2 4GB Intel Xeon gcp-intel-spr-x2
gcp x4 $0.280 4 8GB Intel Xeon gcp-intel-spr-x4
gcp x8 $0.560 8 16GB Intel Xeon gcp-intel-spr-x8

GPU Instances

The table below shows currently available GPU instances and their hourly pricing. If the instance type cannot be selected in the application, you need to request a quota to use it.

Provider Instance Size Hourly rate GPUs Memory Architecture ID
aws small $0.5 1 14GB NVIDIA T4 aws-g4dn-xlarge-small
aws medium $1 1 24GB NVIDIA A10G aws-g5-2xlarge-medium
aws large $3 4 56GB NVIDIA T4 aws-g4dn-12xlarge-large
aws xlarge $4 1 80GB NVIDIA A100 aws-p4de-xlarge
aws xxlarge $5 4 96GB NVIDIA A10G aws-g5-12xlarge-xxlarge
aws 2xlarge $8 2 160GB NVIDIA A100 aws-p4de-2xlarge
aws 4xlarge $16 4 320GB NVIDIA A100 aws-p4de-4xlarge
aws 8xlarge $32 8 640GB NVIDIA A100 not available
gcp x1 $0.5 1 16GB NVIDIA T4 gcp-nvidia-t4-x1
gcp x1 $1 1 24GB NVIDIA L4 gcp-nvidia-l4-x1
gcp x4 $5 4 96GB NVIDIA L4 gcp-nvidia-l4-x4
gcp x1 $6 1 80 GB NVIDIA A100 gcp-nvidia-a100-x1
gcp x2 $12 2 160 GB NVIDIA A100 gcp-nvidia-a100-x2
gcp x4 $24 4 320 GB NVIDIA A100 gcp-nvidia-a100-x4
gcp x8 $48 8 640 GB NVIDIA A100 gcp-nvidia-a100-x8

Pricing examples

The following example pricing scenarios demonstrate how costs are calculated. You can find the hourly rate for all instance types and sizes in the tables above. Use the following formula to calculate the costs:

instance hourly rate * ((hours * # min replica) + (scale-up hrs * # additional replicas))

Basic Endpoint

  • AWS CPU medium (2 x 4GB vCPUs)
  • Autoscaling (minimum 1 replica, maximum 1 replica)

hourly cost

instance hourly rate * (hours * # min replica) = hourly cost
$0.064/hr * (1hr * 1 replica) = $0.064/hr

monthly cost

instance hourly rate * (hours * # min replica) = monthly cost
$0.064/hr * (730hr * 1 replica) = $46.72/month

basic-chart

Advanced Endpoint

  • AWS GPU small (1 x 14GB GPU)
  • Autoscaling (minimum 1 replica, maximum 3 replica), every hour a spike in traffic scales the Endpoint from 1 to 3 replicas for 15 minutes

hourly cost

instance hourly rate * ((hours * # min replica) + (scale-up hrs * # additional replicas)) = hourly cost
$0.5/hr * ((1hr * 1 replica) + (0.25hr * 2 replicas)) = $0.75/hr

monthly cost

instance hourly rate * ((hours * # min replica) + (scale-up hrs * # additional replicas)) = monthly cost
$0.5/hr * ((730hr * 1 replica) + (182.5hr * 2 replicas)) = $547.5/month

advanced-chart