π€ Accelerated Inference API
Integrate into your apps over 20,000 pre-trained state of the art models, or your own private models, via simple HTTP requests, with 2x to 10x faster inference than out of the box deployment, and scalability built-in.
Hugging Face is trusted in production by over 5,000 companies


Main features:
- Leverage 20,000+ Transformer models (T5, Blenderbot, Bart, GPT-2, Pegasus...)
- Upload, manage and serve your own models privately
- Run Classification, NER, Conversational, Summarization, Translation, Question-Answering, Embeddings Extraction tasks
- Get up to 10x inference speedup to reduce user latency
- Accelerated inference on CPU and GPU (GPU requires a Startup or Enterprise plan)
- Run large models that are challenging to deploy in production
- Scale to 1,000 requests per second with automatic scaling built-in
- Ship new NLP features faster as new models become available
- Build your business on a platform powered by the reference open source project in NLP
If you are looking for custom support from the Hugging Face team

Third-party library models:
The Hub now supports many new libraries:
- SpaCy, AllenNLP,
- Speechbrain,
- Timm and many othersβ¦
Those models are enabled on the API thanks to some docker integration api-inference-community.
Please note however, that these models will not allow you (tracking issue):
- To get full optimization
- To run private models
- To get access to GPU inference
Community models:
Because community models are using external libraries, these are not currently supported by Inference API at this point in time. Please take a look at https://github.com/huggingface/api-inference-community/ for further definition and inference of community models.