Inference Endpoints (dedicated) documentation

Inference Endpoints Version

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Inference Endpoints Version

Hugging Face Inference Endpoints comes with a default serving container which is used for all supported Transformers and Sentence-Transformers tasks and for custom inference handler and implement batching. Below you will find information about the installed packages and versions used.

You can always upgrade installed packages and a custom packages by adding a requirements.txt file to your model repository. Read more in Add custom Dependencies.

Installed packages & version

The Hugging Face Inference Runtime has separate versions for PyTorch and TensorFlow for CPU and GPU, which are used based on the selected framework when creating an Inference Endpoint. The TensorFlow and PyTorch flavors are grouped together in the list below.

General

  • Python: 3.11
  • huggingface_hub: 0.20.3
  • pytorch: 2.2.0
  • transformers[sklearn,sentencepiece,audio,vision]: 4.38.2
  • diffusers: 0.26.3
  • accelerate: 0.27.2
  • sentence_transformers: 2.4.0
  • pandas: latest
  • peft: 0.9.0
  • tensorflow: latest

GPU

  • CUDA: 12.3

Optimized Container

  • text-generation-inference: 2.1.0
  • text-embeddings-inference: 1.2.0
< > Update on GitHub