Integrate into your apps over 20,000 pre-trained state of the art models, or your own private models, via simple HTTP requests, with 2x to 10x faster inference than out of the box deployment, and scalability built-in.
Check out the API Usage Dashboard (beta) where you can monitor your usage and requests, per model.
- Leverage 20,000+ Transformer models (T5, Blenderbot, Bart, GPT-2, Pegasus...)
- Upload, manage and serve your own models privately
- Run Classification, Image Segmentation, NER, Conversational, Summarization, Translation, Question-Answering, Embeddings Extraction tasks
- Get up to 10x inference speedup to reduce user latency
- Accelerated inference on CPU and GPU (GPU requires a Startup or Enterprise plan)
- Run large models that are challenging to deploy in production
- Scale to 1,000 requests per second with automatic scaling built-in
- Ship new NLP, CV, Audio, or RL features faster as new models become available
- Build your business on a platform powered by the reference open source project in ML
The Hub now supports many new libraries:
Those models are enabled on the API thanks to some docker integration api-inference-community.
Please note however, that these models will not allow you (tracking issue):
- To get full optimization
- To run private models
- To get access to GPU inference
Because community models are using external libraries, these are not currently supported by Inference API at this point in time. Please take a look at https://github.com/huggingface/api-inference-community/ for further definition and inference of community models.