Detailed usage and pinned models
API Usage dashboard
The API Usage Dashboard (beta) shows historical number of requests and input characters per model for an API Token.
Please note that each user account, and each organization, has its own API Token. Community Pro and Organization Lab subscriptions are billed according to the organization API Token usage. By default, you should not have anything to do. However, if you have any doubt about what’s being shown to you, or you have a complex setup (user subscription, multiple organizations and so on), please contact api-entreprise@hugginface.co.

Pinned models
A pinned model is a model which is preloaded for inference and instantly available for requests authenticated with an API Token.
Community Pro and Organization Lab subscriptions can have a number of models pinned to their organization API Token - see pricing for details.
You can set pinned models to your API Token in the API Usage dashboard.
Model pinning is also accessible directly from the API. Here is how you see what your current pinned models are :
import requests
api_url = "https://api-inference.huggingface.co/usage/pinned_models"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
response = requests.get(api_url, headers=headers)
# {"pinned_models": [...], "allowed_pinned_models": 5}
Pinning models is done that way.
Be careful, you need to specify ALL the pinned models each time !
import json
import requests
api_url = "https://api-inference.huggingface.co/usage/pinned_models"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
# XXX: Put ALL the models you want to pin at once, this will override
# the previous values.
data = json.dumps({"pinned_models": [{"model_id": "gpt2", "compute_type": "cpu"}]})
response = requests.post(api_url, headers=headers, data=data)
# {"ok":"Pinned 1 models, please wait while we load them.","pinned_models":[{"model_id": "gpt2", "compute_type": "cpu"}]}'