API Inference documentation

Detailed usage and pinned models

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Detailed usage and pinned models

API Usage dashboard

The API Usage Dashboard (beta) shows historical number of requests and input characters per model for an API Token.

Please note that each user account, and each organization, has its own API Token. Community Pro and Organization Lab subscriptions are billed according to the organization API Token usage. By default, you should not have anything to do. However, if you have any doubt about what’s being shown to you, or you have a complex setup (user subscription, multiple organizations and so on), please contact api-entreprise@hugginface.co.

Pinned models

A pinned model is a model which is preloaded for inference and instantly available for requests authenticated with an API Token.

Community Pro and Organization Lab subscriptions can have a number of models pinned to their organization API Token - see pricing for details.

You can set pinned models to your API Token in the API Usage dashboard.

Pinned models

Model pinning is also accessible directly from the API. Here is how you see what your current pinned models are :

Python
JavaScript
cURL
import requests
api_url = "https://api-inference.huggingface.co/usage/pinned_models"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
response = requests.get(api_url, headers=headers)
# {"pinned_models": [...], "allowed_pinned_models": 5}

Pinning models is done that way.

Be careful, you need to specify ALL the pinned models each time !

Python
JavaScript
cURL
import json
import requests
api_url = "https://api-inference.huggingface.co/usage/pinned_models"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
# XXX: Put ALL the models you want to pin at once, this will override
# the previous values.
data = json.dumps({"pinned_models": [{"model_id": "gpt2", "compute_type": "cpu"}]})
response = requests.post(api_url, headers=headers, data=data)
# {"ok":"Pinned 1 models, please wait while we load them.","pinned_models":[{"model_id": "gpt2", "compute_type": "cpu"}]}'