Docker
HUGS support deployment on Docker. You can run HUGS with default settings from a command line, or customize your configuration by creating your own docker-compose.yml file.
Run HUGS with Docker
To run HUGS with Docker using default settings, run this command from from your shell:
export HUGS_CACHE=~/.cache/hugs
mkdir -p "$HUGS_CACHE"
docker run -it --rm \
--gpus all \
--shm-size=16GB \
-v "$HUGS_CACHE:/tmp" \
-p 8080:80 \
'hfhugs/nvidia-google-gemma-2-9b-it:0.2.0'
The container URI might differ depending on the distribution and the model you are using.
The command sets the following default environment variables in the container:
HUGS_CACHE
defaults to~/.cache/hugs
. This is the cache for the models, for faster loading next time.
Run HUGS Container with AWS Inferentia2/Trainium
To run HUGS on AWS instances with Inferentia2 or Trainium accelerators, you’ll need to use a compatible Docker image and configure access to the Neuron devices. There are two ways to provide container access to Neuron devices:
- Using
--privileged
flag (grants access to all Neuron devices) - Using
--device
flags (grants access to specific Neuron devices)
You can run the container with the --privileged
flag to grant access to all Neuron devices:
export HUGS_CACHE=~/.cache/hugs
mkdir -p "$HUGS_CACHE"
sudo docker run -it --rm \
--shm-size=16GB \
--privileged \
-v "$HUGS_CACHE:/tmp" \
-p 8080:80 \
'hfhugs/neuron-meta-llama-meta-llama-3.1-8b-instruct:0.2.0'
Alternatively, you can specify individual Neuron devices using --device
flags for more fine-grained control:
export HUGS_CACHE=~/.cache/hugs
mkdir -p "$HUGS_CACHE"
sudo docker run -it --rm \
--shm-size=16GB \
--device=/dev/neuron0 \
--device=/dev/neuron1 \
--device=/dev/neuron2 \
--device=/dev/neuron3 \
-v "$HUGS_CACHE:/tmp" \
-p 8080:80 \
'hfhugs/neuron-meta-llama-meta-llama-3.1-8b-instruct:0.2.0'
Sample Docker Compose file
You can also use a docker-compose.yml
file to customize your configuration.
version: '3.8'
services:
hugs:
image: hfhugs/nvidia-google-gemma-2-9b-it
ports:
- 8080:80
volumes:
- ${HUGS_CACHE:-~/.cache/hugs}:/tmp
environment:
- HUGS_CACHE=/tmp
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
shm_size: 16GB
restart: on-failure:0
volumes:
hugs_cache:
Edit the docker-compose.yml
file to suit your needs. You can add or remove environment variables, change the port mappings. To start your HUGS instance, run this command from your shell:
docker compose up