Real-Time-SD-Turbo / README.md
radames's picture
disable_embedding
311d898
|
raw
history blame
4.13 kB
metadata
title: Real-Time Latent Consistency Model Image-to-Image ControlNet
emoji: 🖼️🖼️
colorFrom: gray
colorTo: indigo
sdk: docker
pinned: false
suggested_hardware: a10g-small
disable_embedding: true

Real-Time Latent Consistency Model

This demo showcases Latent Consistency Model (LCM) using Diffusers with a MJPEG stream server. You can read more about LCM + LoRAs with diffusers here.

You need a webcam to run this demo. 🤗

See a collecting with live demos here

Running Locally

You need CUDA and Python 3.10, Node > 19, Mac with an M1/M2/M3 chip or Intel Arc GPU

Install

python -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
cd frontend && npm install && npm run build && cd ..
# fastest pipeline
python run.py --reload --pipeline img2imgSD21Turbo 

Pipelines

You can build your own pipeline following examples here here, don't forget to fuild the frontend first

cd frontend && npm install && npm run build && cd ..

LCM

Image to Image

python run.py --reload --pipeline img2img 

LCM

Text to Image

python run.py --reload --pipeline txt2img 

Image to Image ControlNet Canny

python run.py --reload --pipeline controlnet 

LCM + LoRa

Using LCM-LoRA, giving it the super power of doing inference in as little as 4 steps. Learn more here or technical report

Image to Image ControlNet Canny LoRa

python run.py --reload --pipeline controlnetLoraSD15

or SDXL, note that SDXL is slower than SD15 since the inference runs on 1024x1024 images

python run.py --reload --pipeline controlnetLoraSDXL

Text to Image

python run.py --reload --pipeline txt2imgLora

or

python run.py --reload --pipeline txt2imgLoraSDXL

Setting environment variables

TIMEOUT: limit user session timeout
SAFETY_CHECKER: disabled if you want NSFW filter off
MAX_QUEUE_SIZE: limit number of users on current app instance
TORCH_COMPILE: enable if you want to use torch compile for faster inference works well on A100 GPUs USE_TAESD: enable if you want to use Autoencoder Tiny

If you run using bash build-run.sh you can set PIPELINE variables to choose the pipeline you want to run

PIPELINE=txt2imgLoraSDXL bash build-run.sh

and setting environment variables

TIMEOUT=120 SAFETY_CHECKER=True MAX_QUEUE_SIZE=4 python run.py --reload --pipeline txt2imgLoraSDXL

If you're running locally and want to test it on Mobile Safari, the webserver needs to be served over HTTPS, or follow this instruction on my comment

openssl req -newkey rsa:4096 -nodes -keyout key.pem -x509 -days 365 -out certificate.pem
python run.py --reload --ssl-certfile=certificate.pem --ssl-keyfile=key.pem

Docker

You need NVIDIA Container Toolkit for Docker, defaults to `controlnet``

docker build -t lcm-live .
docker run -ti -p 7860:7860 --gpus all lcm-live

reuse models data from host to avoid downloading them again, you can change ~/.cache/huggingface to any other directory, but if you use hugingface-cli locally, you can share the same cache

docker run -ti -p 7860:7860 -e HF_HOME=/data -v ~/.cache/huggingface:/data  --gpus all lcm-live

or with environment variables

docker run -ti -e PIPELINE=txt2imgLoraSDXL -p 7860:7860 --gpus all lcm-live

Development Mode

python run.py --reload  

Demo on Hugging Face

https://huggingface.co/spaces/radames/Real-Time-Latent-Consistency-Model

https://github.com/radames/Real-Time-Latent-Consistency-Model/assets/102277/c4003ac5-e7ff-44c0-97d3-464bb659de70