Inference API documentation

πŸ€— Hosted Inference API

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

πŸ€— Hosted Inference API

Test and evaluate, for free, over 80,000 publicly accessible machine learning models, or your own private models, via simple HTTP requests, with fast inference hosted on Hugging Face shared infrastructure.

The Inference API is free to use, and rate limited. If you need an inference solution for production, check out our Inference Endpoints service. With Inference Endpoints, you can easily deploy any machine learning model on dedicated and fully managed infrastructure. Select the cloud, region, compute instance, autoscaling range and security level to match your model, latency, throughput, and compliance needs.

Main features:

  • Get predictions from 80,000+ Transformers models (T5, Blenderbot, Bart, GPT-2, Pegasus...)
  • Switch from one model to the next by just switching the model ID
  • Use built-in integrations with over 20 Open-Source libraries (spaCy, SpeechBrain, etc).
  • Upload, manage and serve your own models privately
  • Run Classification, Image Segmentation, Automatic Speech Recognition, NER, Conversational, Summarization, Translation, Question-Answering, Embeddings Extraction tasks
  • Out of the box accelerated inference on CPU powered by Intel Xeon Ice Lake

Third-party library models:

Please note however, that these models will not allow you (tracking issue):

  • To get full optimization
  • To run private models
  • To get access to GPU inference

If you are looking for custom support from the Hugging Face team

HuggingFace Expert Acceleration Program

Hugging Face is trusted in production by over 10,000 companies