Please refer to Inference API Documentation for detailed information.
For 🤗 Transformers models, Pipelines power the API.
On top of
Pipelines and depending on the model type, there are several production optimizations like:
- compiling models to optimized intermediary representations (e.g. ONNX),
- maintaining a Least Recently Used cache, ensuring that the most popular models are always loaded,
- scaling the underlying compute infrastructure on the fly depending on the load constraints.
inference: false in your model card’s metadata.
For some tasks, there might not be support in the inference API, and, hence, there is no widget.
For all libraries (except 🤗 Transformers), there is a library-to-tasks.ts file of supported tasks in the API. When a model repository has a task that is not supported by the repository library, the repository has
inference: false by default.
If you are interested in accelerated inference, higher volumes of requests, or an SLA, please contact us at
api-enterprise at huggingface.co.
huggingface_hub library has a client wrapper documented here.