Unlocking Creativity with Text-to-Image Generation: Exploring LoRA Models and Styles [Generative Vision]

Community Article Published August 8, 2024

LoRA Models

LoRA (Low-Rank Adaptation) models enhance the capabilities of Stable Diffusion by providing specialized styles and characteristics. These models adjust the base model’s weights to generate images in specific styles or themes. In our application, we integrate several LoRA models, each designed to capture different artistic elements. Checkout the space here Generative Vision.

Building the Application : Gradio SDK

Application uses Gradio, a Python library that simplifies the creation of web interfaces for machine learning models. Gradio allows users to interact with models via a simple web interface, making it accessible even to those without programming knowledge.

Features and Functionality

Image Styles

The application offers multiple predefined styles, ranging from ultra-high-definition (UHD) 8K images to minimalistic designs. These styles adjust the model’s output, providing users with flexibility in their creative process.

Here’s a draft of an article you can use for Medium, based on the script for a Gradio-based text-to-image generation application using the Stable Diffusion model with various LoRA models:

LoRA Models Used

Incorporated a variety of LoRA models, each suited for different artistic styles and subjects:

Realism (Face/Character): Ideal for generating lifelike portraits and characters, capturing intricate details and expressions.
Pixar (Art/Toons): Emulates the iconic Pixar style, perfect for creating cartoon-like images with vibrant colors.
Photoshoot (Camera/Film): Mimics professional photography, adding a cinematic touch to images.
Clothing (Hoodies/Pants/Shirts): Focuses on fashion, generating detailed images of clothing items.
Interior Architecture (House/Hotel): Captures the essence of interior design, creating stunning architectural visuals.
Fashion Product (Wearing/Usable): Generates images of fashion accessories, showcasing products with elegance.
Minimalistic Image (Minimal/Detailed): Produces clean, simple images with detailed elements.
Modern Clothing (Trend/New): Focuses on contemporary fashion trends, providing modern and stylish visuals.
Animaliea (Farm/Wild): Generates images of animals, both domestic and wild, with artistic flair.
Liquid Wallpaper (Minimal/Illustration): Creates abstract, fluid designs suitable for wallpapers.
Canes Cars (Realistic/Future Cars): Specializes in realistic and futuristic car designs.
Pencil Art (Characteristic/Creative): Emulates hand-drawn pencil sketches, adding a personal touch to images.
Art Minimalistic (Paint/Semireal): Blends realism with artistic minimalism, creating semi-abstract visuals.

Customization Options

Users can customize their images further by adjusting parameters like seed, width, height, and guidance scale. These settings allow users to explore different creative possibilities, generating unique and diverse outputs.

Using the Application

To generate an image, users simply enter a prompt describing their desired scene or subject. They can choose to use a negative prompt to exclude specific elements from the output. The application processes the input, applies the selected LoRA model and style, and generates an image.

Example Prompts

Realism: “Man in the style of dark beige and brown, UHD image, youthful protagonists, nonrepresentational.”
Pixar: “A young man with light brown wavy hair and light brown eyes sitting in an armchair and looking directly at the camera, Pixar style, Disney Pixar, office background, ultra-detailed, 1 man.”
Hoodie: “Front view, capture an urban style, Superman Hoodie, technical materials, fabric small point label on text Blue theory, the design is minimal, with a raised collar, fabric is a Light yellow, low angle to capture the hoodie’s form and detailing, f/5.6 to focus on the hoodie’s craftsmanship, solid grey background, studio light setting, with Batman logo in the chest region of the t-shirt.”

Step-by-Step Explanation

1. Importing Packages

The script begins by importing several essential packages. Each of these plays a critical role in the application’s functionality:

import os
import random
import uuid
from typing import Tuple
import gradio as gr
import numpy as np
from PIL import Image
import spaces
import torch
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler

os: Provides a way to interact with the operating system, though not explicitly used in the script, it is typically used for file operations.

random: Used to generate random numbers, which can be useful for randomizing seeds in the image generation process.

uuid: Generates unique identifiers, ensuring that each saved image has a unique filename.

typing: Specifically, the Tuple type is used for function annotations, improving code readability and maintainability.

gradio: A library to create web interfaces easily, allowing users to interact with the image generation model through a simple interface.

numpy (np): Provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions.

PIL (Pillow): A library that adds image processing capabilities to your Python interpreter.

spaces: A module typically used for managing computational resources like GPUs in Hugging Face Spaces.

torch: A deep learning library, PyTorch, is used for handling computations on tensors, enabling the model to run on GPUs.

diffusers: Contains utilities for diffusion models, specifically the Stable Diffusion model and its scheduler.

2. Hugging Face Authentication

To use models from the Hugging Face Hub, the script requires authentication:

from huggingface_hub import login
# Log in to Hugging Face using the provided token
hf_token = '------------HF_TOKEN----------------'
login(hf_token)

huggingface_hub: This package facilitates interaction with the Hugging Face model repository. The login function is used to authenticate with the Hugging Face

Hub.hf_token: A placeholder for your actual Hugging Face token. This token is used to authenticate your account and gain access to models stored in the Hugging Face Hub.

login(hf_token): Logs into the Hugging Face Hub using the provided token. This step is crucial for accessing private models or additional resources that require authentication.

3. Description and Utility Functions

The script sets up some descriptions and utility functions to handle the images and seeds:


DESCRIPTIONz = """## STABLE IMAGINE 🍺"""

def save_image(img):
    unique_name = str(uuid.uuid4()) + ".png"
    img.save(unique_name)
    return unique_name

def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
    if randomize_seed:
        seed = random.randint(0, MAX_SEED)
    return seed

MAX_SEED = np.iinfo(np.int32).max

save_image(img): Saves an image with a unique filename generated using uuid. This function ensures that every image has a distinct name.

randomize_seed_fn: Randomizes the seed if randomize_seed is set to True. This adds variety to the images generated by changing the random seed.

MAX_SEED: Sets the maximum value for the seed using NumPy’s integer information, ensuring the seed value is within valid bounds.

4. Model Setup

This section checks for GPU availability and sets up the image generation pipeline:

if not torch.cuda.is_available():
    DESCRIPTIONz += "\n<p>⚠️Running on CPU, This may not work on CPU. If it runs for an extended time or if you encounter errors, try running it on a GPU by duplicating the space using @spaces.GPU(). +import spaces.📍</p>"

USE_TORCH_COMPILE = 0
ENABLE_CPU_OFFLOAD = 0

if torch.cuda.is_available():
    pipe = StableDiffusionXLPipeline.from_pretrained(
        "SG161222/RealVisXL_V4.0_Lightning",
        torch_dtype=torch.float16,
        use_safetensors=True,
    )
    pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.conf

torch.cuda.is_available(): Checks if a CUDA-compatible GPU is available. If not, a warning message is added to the description.

USE_TORCH_COMPILE and ENABLE_CPU_OFFLOAD: Configuration options for the PyTorch pipeline, though they are set to 0 (disabled) in this script.

StableDiffusionXLPipeline: Loads the Stable Diffusion model. The model is configured to use half-precision (float16) to reduce memory usage and improve performance.

EulerAncestralDiscreteScheduler: Sets the scheduler for the diffusion process, controlling how the noise is iteratively reduced to form an image.

5. Loading LoRA Models

LoRA models are loaded to modify the base model’s style or characteristics:


LORA_OPTIONS = {
        "Realism (face/character)👦🏻": ("prithivMLmods/Canopus-Realism-LoRA", "Canopus-Realism-LoRA.safetensors", "rlms"),
        "Pixar (art/toons)🙀": ("prithivMLmods/Canopus-Pixar-Art", "Canopus-Pixar-Art.safetensors", "pixar"),
        "Photoshoot (camera/film)📸": ("prithivMLmods/Canopus-Photo-Shoot-Mini-LoRA", "Canopus-Photo-Shoot-Mini-LoRA.safetensors", "photo"),
        "Clothing (hoodies/pant/shirts)👔": ("prithivMLmods/Canopus-Clothing-Adp-LoRA", "Canopus-Dress-Clothing-LoRA.safetensors", "clth"),
        "Interior Architecture (house/hotel)🏠": ("prithivMLmods/Canopus-Interior-Architecture-0.1", "Canopus-Interior-Architecture-0.1δ.safetensors", "arch"),
        "Fashion Product (wearing/usable)👜": ("prithivMLmods/Canopus-Fashion-Product-Dilation", "Canopus-Fashion-Product-Dilation.safetensors", "fashion"),
        "Minimalistic Image (minimal/detailed)🏞️": ("prithivMLmods/Pegasi-Minimalist-Image-Style", "Pegasi-Minimalist-Image-Style.safetensors", "minimalist"),
        "Modern Clothing (trend/new)👕": ("prithivMLmods/Canopus-Modern-Clothing-Design", "Canopus-Modern-Clothing-Design.safetensors", "mdrnclth"),
        "Animaliea (farm/wild)🫎": ("prithivMLmods/Canopus-Animaliea-Artism", "Canopus-Animaliea-Artism.safetensors", "Animaliea"),
        "Liquid Wallpaper (minimal/illustration)🖼️": ("prithivMLmods/Canopus-Liquid-Wallpaper-Art", "Canopus-Liquid-Wallpaper-Minimalize-LoRA.safetensors", "liquid"),
        "Canes Cars (realistic/futurecars)🚘": ("prithivMLmods/Canes-Cars-Model-LoRA", "Canes-Cars-Model-LoRA.safetensors", "car"),
        "Pencil Art (characteristic/creative)✏️": ("prithivMLmods/Canopus-Pencil-Art-LoRA", "Canopus-Pencil-Art-LoRA.safetensors", "Pencil Art"),
        "Art Minimalistic (paint/semireal)🎨": ("prithivMLmods/Canopus-Art-Medium-LoRA", "Canopus-Art-Medium-LoRA.safetensors", "mdm"),

    }

for model_name, weight_name, adapter_name in LORA_OPTIONS.values():
    pipe.load_lora_weights(model_name, weight_name=weight_name, adapter_name=adapter_name)
pipe.to("cuda")

LORA_OPTIONS: A dictionary mapping human-readable model names to their corresponding model paths, weight files, and adapter names. Each entry represents a specific style or theme.

pipe.load_lora_weights: Loads the LoRA weights for each model, customizing the image generation style.

pipe.to(“cuda”): Transfers the pipeline to the GPU for faster processing if available.

6. Defining Styles

Styles define the characteristics of the generated images, such as resolution and detail level:

style_list = [
    {
        "name": "3840 x 2160",
        "prompt": "hyper-realistic 8K image of {prompt}. ultra-detailed, lifelike, high-resolution, sharp, vibrant colors, photorealistic",
        "negative_prompt": "cartoonish, low resolution, blurry, simplistic, abstract, deformed, ugly",
    },
    ...
]
styles = {k["name"]: (k["prompt"], k["negative_prompt"]) for k in style_list

}

style_list: A list of dictionaries, each specifying a style with its name, prompt, and negative prompt. The prompt is formatted to insert the user’s input.

styles: Converts style_list into a dictionary for easier access by style name.

7. Applying Styles

The function apply_style modifies the prompts based on the selected style:

def apply_style(style_name: str, positive: str, negative: str = "") -> Tuple[str, str]:
    if style_name in styles:
        p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])
    else:
        p, n = styles[DEFAULT_STYLE_NAME]

    if not negative:
        negative = ""
    return p.replace("{prompt}", positive), n + negative

apply_style: Takes a style name and prompts as input and returns the modified prompts based on the style. It inserts the positive prompt into the style-specific template and appends any additional negative prompts.

8. Generating Images

The core function generate is decorated with @spaces.GPU to enable GPU usage:

@spaces.GPU(duration=60, enable_queue=True)
def generate(
    prompt: str,
    negative_prompt: str = "",
    use_negative_prompt: bool = False,
    seed: int = 0,
    width: int = 1024,
    height: int = 1024,
    guidance_scale: float = 3,
    randomize_seed: bool = False,
    style_name: str = DEFAULT_STYLE_NAME,
    lora_model: str = "Realism (face/character)👦🏻",
    progress=gr.Progress(track_tqdm=True),
):
    seed = int(randomize_seed_fn(seed, randomize_seed))

    positive_prompt, effective_negative_prompt = apply_style(style_name, prompt, negative_prompt)
    
    if not use_negative_prompt:
        effective_negative_prompt = ""  # type: ignore

    model_name, weight_name, adapter_name = LORA_OPTIONS[lora_model]
    pipe.set_adapters(adapter_name)

    images = pipe(
        prompt=positive_prompt,
        negative_prompt=effective_negative_prompt,
        width=width,
        height=height,
        guidance_scale=guidance_scale,
        num_inference_steps=20,
        num_images_per_prompt=1,
        cross_attention_kwargs={"scale": 0.65},
        output_type="pil",
    ).images
    image_paths = [save_image(img) for img in images]
    return image_paths, seed

@spaces.GPU: Decorator to allocate GPU resources for the function, setting a maximum duration and enabling a queue for processing.

generate: The main function that generates images. It processes user inputs, sets model parameters, and runs the pipeline to produce images.

apply_style: Applies the chosen style to the prompts.

pipe.set_adapters(adapter_name): Activates the specified LoRA model.

pipe: Calls the pipeline with the configured prompts and parameters, generating the images.

save_image: Saves each generated image with a unique filename and returns the paths.

9. Building the Gradio Interface

The final part sets up the Gradio interface:

with gr.Blocks() as demo:
    gr.Markdown(DESCRIPTIONz)
    
    with gr.Row():
        input_prompt = gr.Textbox(label="Prompt", placeholder="Enter prompt", lines=2)
        use_negative_prompt = gr.Checkbox(label="Use negative prompt?", value=False)
        negative_prompt = gr.Textbox(label="Negative Prompt", placeholder="Enter negative prompt", lines=2)
    
    with gr.Row():
        randomize_seed = gr.Checkbox(label="Randomize Seed", value=False)
        seed = gr.Number(value=0, label="Seed")
    
    with gr.Row():
        style_dropdown = gr.Dropdown(label="Image Style", choices=list(styles.keys()), value=DEFAULT_STYLE_NAME)
        lora_dropdown = gr.Dropdown(label="LoRA Model", choices=list(LORA_OPTIONS.keys()), value="Realism (face/character)👦🏻")
    
    with gr.Row():
        width = gr.Slider(512, 2048, value=1024, step=64, label="Width")
        height = gr.Slider(512, 2048, value=1024, step=64, label="Height")
    
    with gr.Row():
        guidance_scale = gr.Slider(1.0, 15.0, value=3, step=0.5, label="Guidance Scale")
    
    output_gallery = gr.Gallery(label="Generated Images").style(grid=(2, 4), height="auto")
    output_seed = gr.Number(label="Final Seed", interactive=False)
    
    generate_button = gr.Button("Generate Images")

    generate_button.click(
        fn=generate,
        inputs=[
            input_prompt,
            negative_prompt,
            use_negative_prompt,
            seed,
            width,
            height,
            guidance_scale,
            randomize_seed,
            style_dropdown,
            lora_dropdown,
        ],
        outputs=[output_gallery, output_seed],
    )

demo.launch()
gr.Blocks(): Sets up a Gradio interface using a block structure.
gr.Markdown: Displays the description at the top of the interface.
gr.Textbox: Used for entering text prompts and negative prompts.
gr.Checkbox: Toggles options like randomizing seeds and using negative prompts.
gr.Dropdown: Allows users to select styles and LoRA models.
gr.Slider: Provides a slider interface for numerical inputs like image width, height, and guidance scale.
gr.Gallery: Displays the generated images in a gallery format.
gr.Button: A button to trigger the image generation process.
generate_button.click: Connects the button to the generate function, passing inputs and outputs to handle user interaction.

Conclusion

The Gradio-based application for text-to-image generation demonstrates the power and versatility of combining Stable Diffusion with LoRA models. It empowers users to create stunning images tailored to their specific artistic vision, all through a simple and intuitive interface. Whether you’re an artist, designer, or enthusiast, this tool offers endless creative possibilities.This is the demo space for generating images using Stable Diffusion with quality styles, different LoRA models and types. Try the sample prompts to generate higher quality images. Try the sample prompts for generating higher quality images.Try prompts. Make sure that the prompts passed meet the trigger word conditions and are well-detailed. This space is for educational purposes only; using it productively is meant for your own knowledge. Users are accountable for the content they generate and are responsible for ensuring it meets appropriate ethical standards.

Image 1 Image 2
Image 3 Image 4

| Live Demo | STABLE-IMAGINE | | GitHub | Gen-Vision | | Hugging Face | prithivMLmods | | Google Colab / Notebook | Gen_Vision.ipynb |

  • Thanks for reading !🤗