Hugging Face Generative AI Services (HUGS) documentation

Docker

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Docker

HUGS support deployment on Docker. You can run HUGS with default settings from a command line, or customize your configuration by creating your own docker-compose.yml file.

Run HUGS with Docker

To run HUGS with Docker using default settings, run this command from from your shell:

export HUGS_CACHE=~/.cache/hugs
mkdir -p "$HUGS_CACHE"
docker run -it --rm \
    --gpus all \
    --shm-size=16GB \
    -v "$HUGS_CACHE:/tmp" \
    -p 8080:80 \
   'hfhugs/nvidia-google-gemma-2-9b-it:0.2.0'

The container URI might differ depending on the distribution and the model you are using.

The command sets the following default environment variables in the container:

  • HUGS_CACHE defaults to ~/.cache/hugs. This is the cache for the models, for faster loading next time.

Run HUGS Container with AWS Inferentia2/Trainium

To run HUGS on AWS instances with Inferentia2 or Trainium accelerators, you’ll need to use a compatible Docker image and configure access to the Neuron devices. There are two ways to provide container access to Neuron devices:

  1. Using --privileged flag (grants access to all Neuron devices)
  2. Using --device flags (grants access to specific Neuron devices)

You can run the container with the --privileged flag to grant access to all Neuron devices:

export HUGS_CACHE=~/.cache/hugs
mkdir -p "$HUGS_CACHE"
sudo docker run -it --rm \
    --shm-size=16GB \
    --privileged \
    -v "$HUGS_CACHE:/tmp" \
    -p 8080:80 \
    'hfhugs/neuron-meta-llama-meta-llama-3.1-8b-instruct:0.2.0'

Alternatively, you can specify individual Neuron devices using --device flags for more fine-grained control:

export HUGS_CACHE=~/.cache/hugs
mkdir -p "$HUGS_CACHE"
sudo docker run -it --rm \
    --shm-size=16GB \
    --device=/dev/neuron0 \
    --device=/dev/neuron1 \
    --device=/dev/neuron2 \
    --device=/dev/neuron3 \
    -v "$HUGS_CACHE:/tmp" \
    -p 8080:80 \
    'hfhugs/neuron-meta-llama-meta-llama-3.1-8b-instruct:0.2.0'
Using individual `--device` flags provides more fine-grained control over which Neuron devices are accessible to the container, which can be useful when running multiple containers on the same instance.

Sample Docker Compose file

You can also use a docker-compose.yml file to customize your configuration.

version: '3.8'

services:
  hugs:
    image: hfhugs/nvidia-google-gemma-2-9b-it
    ports:
      - 8080:80
    volumes:
      - ${HUGS_CACHE:-~/.cache/hugs}:/tmp
    environment:
      - HUGS_CACHE=/tmp
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    shm_size: 16GB
    restart: on-failure:0

volumes:
  hugs_cache:

Edit the docker-compose.yml file to suit your needs. You can add or remove environment variables, change the port mappings. To start your HUGS instance, run this command from your shell:

docker compose up
< > Update on GitHub