Spaces:

kevin1kevin1k
/

WeavePrompt

Runtime error

App Files Files Community

WeavePrompt / README.md

kevin1kevin1k

Upload folder using huggingface_hub

b1917a1 verified about 2 months ago

preview code

raw

history blame contribute delete

3.09 kB

metadata

title: WeavePrompt
emoji: 🎨
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: 5.44.1
pinned: false
license: mit
app_file: app.py
app_port: 7860

WeavePrompt

Iterative prompt refinement for image generation models; by giving a target image, WeavePrompt automatically generates and refines text prompts to make a model's output resemble the target image, using vision-language models and perceptual metrics.

Introduction

WeavePrompt is a research and development project designed to evaluate and refine text-to-image generation prompts across multiple state-of-the-art image generation models. The primary goal is to optimize prompts such that the generated images align closely with a given reference image, improving both fidelity and semantic consistency.

Procedure/Implementation: The process involves generating images from identical prompts using various image generation models, comparing the results to a reference image through a recognition and similarity evaluation pipeline, and iteratively adjusting the prompt to minimize perceptual differences. This feedback loop continues for a set number of iterations, progressively enhancing prompt effectiveness.

To achieve this, WeavePrompt integrates advanced tools:

Image recognition is powered by meta-llama/Llama-4-Scout-17B-16E-Instruct.
Similarity evaluation uses the LPIPS (alex) metric for perceptual comparison.
Image generation models under evaluation include:
- FLUX family: FLUX.1 [pro], [dev], and [schnell]
- Google models: Imagen 4, Imagen 4 Ultra, and Gemini 2.5 Flash Image
- Other models: Stable Diffusion 3.5 Large and Qwen Image

By systematically combining prompt optimization with multi-model evaluation, WeavePrompt aims to advance the understanding of cross-model prompt effectiveness and improve controllability in image generation tasks.

Features

Upload a target image
Step-by-step prompt optimization
View prompt and generated image at each iteration
Full optimization history

Installation

Clone the repository:

git clone https://github.com/kevin1kevin1k/WeavePrompt.git
cd WeavePrompt

Install dependencies:

uv venv
uv sync
source .venv/bin/activate

Setup .env Put the following inside .env:

API keys WANDB_API_KEY and FAL_KEY
Weave project name WEAVE_PROJECT

Usage

Run the demo app:

streamlit run src/app.py

Follow the instructions in the browser to upload an image and step through the optimization process.

Architecture Diagram

Outcome

Use the same prompt as the standard model, the target model yields the similar (high quality) output as a result.

References

https://arxiv.org/abs/1801.03924 - The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
https://arxiv.org/abs/2510.06335 - Image Reconstruction from Highly Undersampled Data
https://arxiv.org/abs/2510.03191 - Product-Quantised Image Representation for High-Quality Image Synthesis