text-embeddings-inference documentation

Using TEI locally with GPU

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Using TEI locally with GPU

You can install text-embeddings-inference locally to run it on your own machine with a GPU. To make sure that your hardware is supported, check out the Supported models and hardware page.

Step 1: CUDA and NVIDIA drivers

Make sure you have CUDA and the NVIDIA drivers installed - NVIDIA drivers on your device need to be compatible with CUDA version 12.2 or higher.

Add the NVIDIA binaries to your path:

export PATH=$PATH:/usr/local/cuda/bin

Step 2: Install Rust

Install Rust on your machine by run the following in your terminal, then following the instructions:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Step 3: Install necessary packages

This step can take a while as we need to compile a lot of cuda kernels.

For Turing GPUs (T4, RTX 2000 series … )

cargo install --path router -F candle-cuda-turing -F http --no-default-features

For Ampere and Hopper

cargo install --path router -F candle-cuda -F http --no-default-features

Step 4: Launch Text Embeddings Inference

You can now launch Text Embeddings Inference on GPU with:

model=BAAI/bge-large-en-v1.5
revision=refs/pr/5

text-embeddings-router --model-id $model --revision $revision --port 8080