Installation from source

Installing TGI from source is not the recommended usage. We strongly recommend to use TGI through Docker, check the Quick Tour, Installation for Nvidia GPUs and Installation for AMD GPUs to learn how to use TGI with Docker.

Install CLI

You can use TGI command-line interface (CLI) to download weights, serve and quantize models, or get information on serving parameters.

To install the CLI, you need to first clone the TGI repository and then run make.

git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
make install

If you would like to serve models with custom kernels, run

BUILD_EXTENSIONS=True make install

Local Installation from Source

Before you start, you will need to setup your environment, and install Text Generation Inference. Text Generation Inference is tested on Python 3.9+.

Text Generation Inference is available on pypi, conda and GitHub.

To install and launch locally, first install Rust and create a Python virtual environment with at least Python 3.9, e.g. using conda:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

conda create -n text-generation-inference python=3.9
conda activate text-generation-inference

You may also need to install Protoc.

On Linux:

PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
rm -f $PROTOC_ZIP

On MacOS, using Homebrew:

brew install protobuf

Then run to install Text Generation Inference:

git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
BUILD_EXTENSIONS=True make install

On some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:

sudo apt-get install libssl-dev gcc -y

Once installation is done, simply run:

make run-falcon-7b-instruct

This will serve Falcon 7B Instruct model from the port 8080, which we can query.

< > Update on GitHub