Serverless Inference API
Instant Access to thousands of ML Models for Fast Prototyping
Explore the most popular models for text, image, speech, and more — all with a simple API request. Build, test, and experiment without worrying about infrastructure or setup.
Why use the Inference API?
The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you’re prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:
- Text Generation: Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
- Image Generation: Easily create customized images, including LoRAs for your own styles.
- Document Embeddings: Build search and retrieval systems with SOTA embeddings.
- Classical AI Tasks: Ready-to-use models for text classification, image classification, speech recognition, and more.
⚡ Fast and Free to Get Started: The Inference API is free with higher rate limits for PRO users. For production needs, explore Inference Endpoints for dedicated resources, autoscaling, advanced security features, and more.
Key Benefits
- 🚀 Instant Prototyping: Access powerful models without setup.
- 🎯 Diverse Use Cases: One API for text, image, and beyond.
- 🔧 Developer-Friendly: Simple requests, fast responses.
Main Features
- Leverage over 800,000+ models from different open-source libraries (transformers, sentence transformers, adapter transformers, diffusers, timm, etc.).
- Use models for a variety of tasks, including text generation, image generation, document embeddings, NER, summarization, image classification, and more.
- Accelerate your prototyping by using GPU-powered models.
- Run very large models that are challenging to deploy in production.
- Production-grade platform without the hassle: built-in automatic scaling, load balancing and caching.
Contents
The documentation is organized into two sections:
- Getting Started Learn the basics of how to use the Inference API.
- API Reference Dive into task-specific settings and parameters.
Looking for custom support from the Hugging Face team?
< > Update on GitHub