metadata

title: Visual Ai
emoji: 🖼
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 5.20.1
app_file: app.py
pinned: false
license: mit
short_description: What you wish to see in the output image.

Stable Diffusion Image Generator

Overview

This project provides a Stable Diffusion image generator powered by the stabilityai/stable-diffusion-2-1 model. It’s optimized for GPU execution with CUDA but includes a CPU fallback option, allowing flexibility based on hardware availability. The application uses the diffusers library and a gradio-based UI for interactive image generation.

Features

Runs on GPU (CUDA) with FP16 precision and memory optimizations or CPU with FP32 precision.
Customizable parameters: prompt, resolution, seed, inference steps, and guidance scale.
Toggle between GPU and CPU execution via the UI.
Built-in performance optimizations for GPU (e.g., memory-efficient attention, tiling).

Prerequisites

Python 3.8+
A CUDA-compatible GPU (optional but recommended for performance).
A Hugging Face account and API token for model access.

Required Dependencies

torch (with CUDA support for GPU usage)
diffusers (for the Stable Diffusion pipeline)
gradio (for the UI)
huggingface_hub (for authentication)
xformers (optional, for GPU memory optimization)
transformers (transitive dependency of diffusers)

Install Dependencies

For GPU support (adjust PyTorch CUDA version as needed):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install diffusers[torch] gradio huggingface_hub transformers
pip install xformers  # Optional, for GPU memory optimization

For CPU-only:

pip install torch torchvision torchaudio
pip install diffusers[torch] gradio huggingface_hub transformers

Environment Setup

Set your Hugging Face API token as an environment variable:

export HUGGINGFACE_TOKEN=your_huggingface_api_token

Run the Application

python app.py

This launches a Gradio UI where you can input parameters and generate images.

Code Implementation

The pipeline dynamically selects the device (cuda or cpu) based on availability and user preference. Here’s a summary of the implementation:

import torch
from diffusers import StableDiffusionPipeline
import gradio as gr
import os
import time
import logging
from huggingface_hub import login

# Logging setup
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

# Load and authenticate with Hugging Face token
hf_token = os.getenv("HUGGINGFACE_TOKEN")
if not hf_token:
    raise ValueError("❌ Error: Hugging Face token not found!")
login(token=hf_token)

# Model setup
model_id = "stabilityai/stable-diffusion-2-1"
device = "cuda" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if device == "cuda" else torch.float32

pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    torch_dtype=torch_dtype,
    revision="fp16" if device == "cuda" else None,
    use_auth_token=hf_token
)

# GPU optimizations (if applicable)
if device == "cuda":
    pipe.to("cuda")
    pipe.enable_xformers_memory_efficient_attention()
    pipe.vae.enable_tiling()
    pipe.enable_attention_slicing()
    torch.backends.cuda.matmul.allow_tf32 = True

logging.info(f"🚀 Running on: {device.upper()} with {torch_dtype}")

# Image generation function
def generate_image(prompt, seed, resolution, steps, guidance, use_gpu):
    device = "cuda" if use_gpu and torch.cuda.is_available() else "cpu"
    pipe.to(device)
    width, height = map(int, resolution.split("x"))
    generator = torch.Generator(device).manual_seed(int(seed)) if seed != "-1" else None

    with torch.autocast("cuda") if device == "cuda" else torch.no_grad():
        image = pipe(prompt, num_inference_steps=int(steps), guidance_scale=float(guidance),
                     generator=generator, width=width, height=height).images[0]
    return image

# Gradio UI setup
with gr.Blocks() as demo:
    gr.Markdown("# 🖌️ **Stable Diffusion Image Generator**")
    with gr.Row():
        with gr.Column():
            prompt_input = gr.Textbox(label="🎨 Prompt")
            resolution_input = gr.Textbox(label="📏 Resolution", value="512x512")
            seed_input = gr.Textbox(label="🔢 Seed (-1 for random)", value="-1")
            steps_input = gr.Slider(10, 50, value=30, label="🛠️ Inference Steps")
            guidance_input = gr.Slider(1.0, 15.0, value=7.5, label="🎛️ Guidance Scale")
            gpu_toggle = gr.Checkbox(label="⚡ Use GPU (if available)", value=True)
            generate_button = gr.Button("🚀 Generate Image")
        with gr.Column():
            image_output = gr.Image(label="🖼️ Generated Image")
    generate_button.click(fn=generate_image, inputs=[prompt_input, seed_input, resolution_input,
                                                    steps_input, guidance_input, gpu_toggle],
                          outputs=image_output)

demo.launch()

Key Notes

Device Flexibility: The script defaults to GPU if available but falls back to CPU if toggled or no GPU is detected.
Optimizations: GPU mode uses FP16, memory-efficient attention (via xformers), tiling, and attention slicing.
Mixed Precision: Uses torch.autocast on GPU; torch.no_grad on CPU.
Optional xformers: Required for GPU memory optimization; install it if using CUDA.

Troubleshooting

Issue: `ValueError: ❌ Error: Hugging Face token not found!`

Solution: Set the HUGGINGFACE_TOKEN environment variable:

export HUGGINGFACE_TOKEN=your_huggingface_api_token

Issue: GPU not detected but expected

Solution:

Check CUDA installation: nvidia-smi
Ensure PyTorch is installed with CUDA support: pip list | grep torch

Issue: `enable_xformers_memory_efficient_attention` fails

Solution: Install xformers:

pip install xformers

Conclusion

This project delivers a flexible and efficient Stable Diffusion image generator, balancing GPU performance with CPU compatibility. Enjoy creating AI art with ease! 🚀