Inference API logo

Transformers, deployed.

Over 10,000 State of the Art models, deployed for inference via simple API calls, with up to 100x speedup, and scalability built-in.

See plans

Starting at $9 per month

Token Classification
Examples
Examples
This model can be loaded on the Inference API on-demand.

Frequently Asked Questions

Which tasks can I run?
All Transformers pipelines available: ASR, feature extraction, text classification, NER, question answering, translation, summarization, text generation, zero-shot classification, conversational AI, table question answering.
Do you speak NLP?
With over 10,000 models trained in over 160 languages, Hugging Face offers the largest and most diverse library of state of the art models, and the Inference API makes them all available to you via simple API calls.
What’s the latency?
We accelerate our models on CPU and GPU so your apps work faster. Read up on how we achieved 100x speedup on Transformers .
Can it scale?
We built our infrastructure to support real-time consumer use cases and scale automatically as usage grows to support up to 1,000 requests per second.
How is my data secure?
All data transfers are encrypted in transit with SSL. Hugging Face protects your inference data - no third-party access. Enterprise plans offer additional layers of security for log-less requests.
What’s your pricing?
Try it free with an account, then pick the plan that works for you - as low as $9/mo. We bill usage by inference input characters, and offer volume-based tiered pricing for high volumes.

Request and we shall serve

State of the Art as easy as HTTP requests

huggingface@transformers:~
import requests

def query(payload, model_id, api_token):
	headers = {"Authorization": f"Bearer {api_token}"}
	API_URL = f"https://api-inference.huggingface.co/models/{model_id}"
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()

model_id = "distilbert-base-uncased"
api_token = "api_XXXXXXXX" # get yours at hf.co/settings/token
data = query("The goal of life is [MASK].", model_id, api_token)

Monitor usage and costs

In your API dashboard

The Accelerated API Inference Dashboard