BentoML Integration Guide

BentoML is an open-source framework designed for building, shipping, and scaling AI applications. It allows users to easily package and serve diffusion models for production, ensuring reliable and efficient deployments. It features out-of-the-box operational management tools like monitoring and tracing, and facilitates the deployment to various cloud platforms with ease. BentoML’s distributed architecture and the separation of API server logic from model inference logic enable efficient scaling of deployments, even with budget constraints. As a result, integrating it with Diffusers provides a valuable tool for real-world deployments.

This tutorial demonstrates how to integrate BentoML with Diffusers.

Prerequisites

Install Diffusers.
Install BentoML by running pip install bentoml. For more information, see the BentoML documentation.

Import a diffusion model

First, you need to prepare the model. BentoML has its own Model Store for model management. Create a download_model.py file as below to import a diffusion model into BentoML’s Model Store:

import bentoml

bentoml.diffusers.import_model(
    "sd2.1",  # Model tag in the BentoML Model Store
    "stabilityai/stable-diffusion-2-1",  # Hugging Face model identifier
)

This code snippet downloads the Stable Diffusion 2.1 model (using it’s repo id stabilityai/stable-diffusion-2-1) from the Hugging Face Hub (or use the cached download files if the model is already downloaded) and imports it into the BentoML Model Store with the name sd2.1.

For models already fine-tuned and stored on disk, you can provide the path instead of the repo id.

import bentoml

bentoml.diffusers.import_model(
    "sd2.1-local",
    "./local_stable_diffusion_2.1/",
)

You can view the model in the Model Store:

bentoml models list

Tag                                                                 Module                              Size       Creation Time       
sd2.1:ysrlmubascajwnry                                              bentoml.diffusers                   33.85 GiB  2023-07-12 16:47:44

Turn a diffusion model into a RESTful service with BentoML

Once the diffusion model is in BentoML’s Model Store, you can implement a text-to-image service with it. The Stable Diffusion model accepts various arguments in addition to the required prompt to guide the image generation process. To validate these input arguments, use BentoML’s pydantic integration. Create a sdargs.py file with an example pydantic model:

import typing as t

from pydantic import BaseModel


class SDArgs(BaseModel):
    prompt: str
    negative_prompt: t.Optional[str] = None
    height: t.Optional[int] = 512
    width: t.Optional[int] = 512

    class Config:
        extra = "allow"

This pydantic model requires a string field prompt and three optional fields: height, width, and negative_prompt, each with corresponding types. The extra = "allow" line supports adding additional fields not defined in the SDArgs class. In a real-world scenario, you may define all the desired fields and not allow extra ones.

Next, create a BentoML Service file that defines a Stable Diffusion service:

import bentoml
from bentoml.io import Image, JSON

from sdargs import SDArgs

bento_model = bentoml.diffusers.get("sd2.1:latest")
sd21_runner = bento_model.to_runner(name="sd21-runner")

svc = bentoml.Service("stable-diffusion-21", runners=[sd21_runner])


@svc.api(input=JSON(pydantic_model=SDArgs), output=Image())
async def txt2img(input_data):
    kwargs = input_data.dict()
    res = await sd21_runner.async_run(**kwargs)
    images = res[0]
    return images[0]

Save the file as service.py, and spin up a BentoML Service endpoint using:

bentoml serve service:svc

An HTTP server with /txt2img endpoint that accepts a JSON dictionary should be up at port 3000. Go to <http://127.0.0.1:3000> in your web browser to access the Swagger UI.

You can also test the text-to-image generation using curl and write the returned image to output.jpg.

curl -X POST http://127.0.0.1:3000/txt2img \
     -H 'Content-Type: application/json' \
     -d "{\"prompt\":\"a black cat\", \"height\":768, \"width\":768}" \
     --output output.jpg

Package a BentoML Service for cloud deployment

To deploy a BentoML Service, you need to pack it into a BentoML Bento, a file archive with all the source code, models, data files, and dependencies. This can be done by providing a bentofile.yaml file as follows:

service: "service.py:svc"
include:
  - "service.py"
python:
  packages:
    - torch
    - transformers
    - accelerate
    - diffusers
    - triton
    - xformers
    - pydantic
docker:
    distro: debian
    cuda_version: "11.6"

The bentofile.yaml file contains Bento build options, such as package dependencies and Docker options.

Then you build a Bento using:

bentoml build

The output looks like:

Successfully built Bento(tag="stable-diffusion-21:crkuh7a7rw5bcasc").

Possible next steps:

 * Containerize your Bento with `bentoml containerize`:
    $ bentoml containerize stable-diffusion-21:crkuh7a7rw5bcasc

 * Push to BentoCloud with `bentoml push`:
    $ bentoml push stable-diffusion-21:crkuh7a7rw5bcasc

You can create a Docker image based on the Bento by running the following command and deploy it to a cloud provider.

bentoml containerize stable-diffusion-21:crkuh7a7rw5bcasc

If you want an end-to-end solution for deploying and managing models, you can push the Bento to Yatai or BentoCloud for a distributed deployment.

For more information about BentoML’s integration with Diffusers, see the BentoML Diffusers Guide.