API Inference documentation

πŸ€— Accelerated Inference API

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

πŸ€— Accelerated Inference API

Integrate into your apps over 20,000 pre-trained state of the art models, or your own private models, via simple HTTP requests, with 2x to 10x faster inference than out of the box deployment, and scalability built-in.

Check out the API Usage Dashboard (beta) where you can monitor your usage and requests, per model.

Hugging Face is trusted in production by over 5,000 companies

Main features:

  • Leverage 20,000+ Transformer models (T5, Blenderbot, Bart, GPT-2, Pegasus...)
  • Upload, manage and serve your own models privately
  • Run Classification, Image Segmentation, NER, Conversational, Summarization, Translation, Question-Answering, Embeddings Extraction tasks
  • Get up to 10x inference speedup to reduce user latency
  • Accelerated inference on CPU and GPU (GPU requires a Startup or Enterprise plan)
  • Run large models that are challenging to deploy in production
  • Scale to 1,000 requests per second with automatic scaling built-in
  • Ship new NLP, CV, Audio, or RL features faster as new models become available
  • Build your business on a platform powered by the reference open source project in ML

If you are looking for custom support from the Hugging Face team

HuggingFace Expert Acceleration Program

Third-party library models:

Please note however, that these models will not allow you (tracking issue):

  • To get full optimization
  • To run private models
  • To get access to GPU inference

Community models:

Because community models are using external libraries, these are not currently supported by Inference API at this point in time. Please take a look at https://github.com/huggingface/api-inference-community/ for further definition and inference of community models.