api-inference documentation

Serverless Inference API

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Serverless Inference API

Instant Access to thousands of ML Models for Fast Prototyping

Explore the most popular models for text, image, speech, and more — all with a simple API request. Build, test, and experiment without worrying about infrastructure or setup.


Why use the Inference API?

The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you’re prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:

  • Text Generation: Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
  • Image Generation: Easily create customized images, including LoRAs for your own styles.
  • Document Embeddings: Build search and retrieval systems with SOTA embeddings.
  • Classical AI Tasks: Ready-to-use models for text classification, image classification, speech recognition, and more.

Fast and Free to Get Started: The Inference API is free with higher rate limits for PRO users. For production needs, explore Inference Endpoints for dedicated resources, autoscaling, advanced security features, and more.


Key Benefits

  • 🚀 Instant Prototyping: Access powerful models without setup.
  • 🎯 Diverse Use Cases: One API for text, image, and beyond.
  • 🔧 Developer-Friendly: Simple requests, fast responses.

Main Features

  • Leverage over 800,000+ models from different open-source libraries (transformers, sentence transformers, adapter transformers, diffusers, timm, etc.).
  • Use models for a variety of tasks, including text generation, image generation, document embeddings, NER, summarization, image classification, and more.
  • Accelerate your prototyping by using GPU-powered models.
  • Run very large models that are challenging to deploy in production.
  • Production-grade platform without the hassle: built-in automatic scaling, load balancing and caching.

Contents

The documentation is organized into two sections:

  • Getting Started Learn the basics of how to use the Inference API.
  • API Reference Dive into task-specific settings and parameters.

Looking for custom support from the Hugging Face team?

HuggingFace Expert Acceleration Program
< > Update on GitHub