Spaces:
Runtime error
Runtime error
| title: WeavePrompt | |
| emoji: 🎨 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| sdk_version: 5.44.1 | |
| pinned: false | |
| license: mit | |
| app_file: app.py | |
| app_port: 7860 | |
| # WeavePrompt | |
| Iterative prompt refinement for image generation models; by giving a target image, **WeavePrompt** automatically generates and refines text prompts to make a model's output resemble the target image, using vision-language models and perceptual metrics. | |
| ## Introduction | |
| **WeavePrompt** is a research and development project designed to evaluate and refine text-to-image generation prompts across multiple state-of-the-art image generation models. | |
| The primary goal is to optimize prompts such that the generated images align closely with a given reference image, improving both fidelity and semantic consistency. | |
| **Procedure/Implementation**: | |
| The process involves generating images from identical prompts using various image generation models, comparing the results to a reference image through a recognition and similarity evaluation pipeline, and iteratively adjusting the prompt to minimize perceptual differences. | |
| This feedback loop continues for a set number of iterations, progressively enhancing prompt effectiveness. | |
| To achieve this, **WeavePrompt** integrates advanced tools: | |
| - **Image recognition** is powered by meta-llama/Llama-4-Scout-17B-16E-Instruct. | |
| - **Similarity evaluation** uses the **LPIPS (alex)** metric for perceptual comparison. | |
| - **Image generation models** under evaluation include: | |
| - FLUX family: FLUX.1 [pro], [dev], and [schnell] | |
| - Google models: Imagen 4, Imagen 4 Ultra, and Gemini 2.5 Flash Image | |
| - Other models: Stable Diffusion 3.5 Large and Qwen Image | |
| By systematically combining prompt optimization with multi-model evaluation, **WeavePrompt** aims to advance the understanding of cross-model prompt effectiveness and improve controllability in image generation tasks. | |
| ## Features | |
| - Upload a target image | |
| - Step-by-step prompt optimization | |
| - View prompt and generated image at each iteration | |
| - Full optimization history | |
| ## Installation | |
| 1. Clone the repository: | |
| ```bash | |
| git clone https://github.com/kevin1kevin1k/WeavePrompt.git | |
| cd WeavePrompt | |
| ``` | |
| 2. Install dependencies: | |
| ```bash | |
| uv venv | |
| uv sync | |
| source .venv/bin/activate | |
| ``` | |
| 3. Setup `.env` | |
| Put the following inside `.env`: | |
| - API keys `WANDB_API_KEY` and `FAL_KEY` | |
| - Weave project name `WEAVE_PROJECT` | |
| ## Usage | |
| Run the demo app: | |
| ```bash | |
| streamlit run src/app.py | |
| ``` | |
| Follow the instructions in the browser to upload an image and step through the optimization process. | |
| ## Architecture Diagram | |
|  | |
| ## Outcome | |
|  | |
| Use the same prompt as the standard model, the target model yields the similar (high quality) output as a result. | |
| ## References | |
| - https://arxiv.org/abs/1801.03924 - The Unreasonable Effectiveness of Deep Features as a Perceptual Metric | |
| - https://arxiv.org/abs/2510.06335 - Image Reconstruction from Highly Undersampled Data | |
| - https://arxiv.org/abs/2510.03191 - Product-Quantised Image Representation for High-Quality Image Synthesis | |