### Containerized Installation for Inference on Linux GPU Servers 1. Ensure docker installed and ready (requires sudo), can skip if system is already capable of running nvidia containers. Example here is for Ubuntu, see [NVIDIA Containers](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) for more examples. ```bash distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit-base sudo apt install nvidia-container-runtime sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker ``` 2. Build the container image: ```bash docker build -t h2ogpt . ``` 3. Run the container (you can also use `finetune.py` and all of its parameters as shown above for training): For the fine-tuned h2oGPT with 20 billion parameters: ```bash docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \ -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \ --base_model=h2oai/h2ogpt-oasst1-512-20b ``` if have a private HF token, can instead run: ```bash docker run --runtime=nvidia --shm-size=64g --entrypoint=bash -p 7860:7860 \ -e HUGGINGFACE_API_TOKEN= \ -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it \ -c 'huggingface-cli login --token $HUGGINGFACE_API_TOKEN && python3.10 generate.py --base_model=h2oai/h2ogpt-oasst1-512-20b --use_auth_token=True' ``` For your own fine-tuned model starting from the gpt-neox-20b foundation model for example: ```bash docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \ -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \ --base_model=EleutherAI/gpt-neox-20b \ --lora_weights=h2ogpt_lora_weights --prompt_type=human_bot ``` 4. Open `https://localhost:7860` in the browser ### Docker Compose Setup & Inference 1. (optional) Change desired model and weights under `environment` in the `docker-compose.yml` 2. Build and run the container ```bash docker-compose up -d --build ``` 3. Open `https://localhost:7860` in the browser 4. See logs: ```bash docker-compose logs -f ``` 5. Clean everything up: ```bash docker-compose down --volumes --rmi all ```