text-embeddings-inference documentation

Serving private and gated models

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Serving private and gated models

If the model you wish to serve is behind gated access or resides in a private model repository on Hugging Face Hub, you will need to have access to the model to serve it.

Once you have confirmed that you have access to the model:

If you’re the CLI, set the HUGGING_FACE_HUB_TOKEN environment variable. For example:

export HUGGING_FACE_HUB_TOKEN=<YOUR READ TOKEN>

Alternatively, you can provide the token when deploying the model with Docker:

model=<your private model>
volume=$PWD/data
token=<your cli Hugging Face Hub token>

docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model