Serving private and gated models

If the model you wish to serve is behind gated access or resides in a private model repository on Hugging Face Hub, you will need to have access to the model to serve it.

Once you have confirmed that you have access to the model:

Navigate to your account’s Profile | Settings | Access Tokens page.
Generate and copy a read token.

If you’re the CLI, set the HUGGING_FACE_HUB_TOKEN environment variable. For example:

export HUGGING_FACE_HUB_TOKEN=<YOUR READ TOKEN>

Alternatively, you can provide the token when deploying the model with Docker:

model=<your private model>
volume=$PWD/data
token=<your cli Hugging Face Hub token>

docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model