Spaces:

kevin1kevin1k
/

WeavePrompt

Runtime error

App Files Files Community

WeavePrompt / README.md

kevin1kevin1k

Upload folder using huggingface_hub

b1917a1 verified about 2 months ago

preview code

raw

history blame contribute delete

3.09 kB

	---
	title: WeavePrompt
	emoji: 🎨
	colorFrom: blue
	colorTo: purple
	sdk: docker
	sdk_version: 5.44.1
	pinned: false
	license: mit
	app_file: app.py
	app_port: 7860
	---

	# WeavePrompt

	Iterative prompt refinement for image generation models; by giving a target image, WeavePrompt automatically generates and refines text prompts to make a model's output resemble the target image, using vision-language models and perceptual metrics.

	## Introduction

	WeavePrompt is a research and development project designed to evaluate and refine text-to-image generation prompts across multiple state-of-the-art image generation models.
	The primary goal is to optimize prompts such that the generated images align closely with a given reference image, improving both fidelity and semantic consistency.

	Procedure/Implementation:
	The process involves generating images from identical prompts using various image generation models, comparing the results to a reference image through a recognition and similarity evaluation pipeline, and iteratively adjusting the prompt to minimize perceptual differences.
	This feedback loop continues for a set number of iterations, progressively enhancing prompt effectiveness.

	To achieve this, WeavePrompt integrates advanced tools:

	- Image recognition is powered by meta-llama/Llama-4-Scout-17B-16E-Instruct.

	- Similarity evaluation uses the LPIPS (alex) metric for perceptual comparison.

	- Image generation models under evaluation include:
	- FLUX family: FLUX.1 [pro], [dev], and [schnell]
	- Google models: Imagen 4, Imagen 4 Ultra, and Gemini 2.5 Flash Image
	- Other models: Stable Diffusion 3.5 Large and Qwen Image

	By systematically combining prompt optimization with multi-model evaluation, WeavePrompt aims to advance the understanding of cross-model prompt effectiveness and improve controllability in image generation tasks.

	## Features
	- Upload a target image
	- Step-by-step prompt optimization
	- View prompt and generated image at each iteration
	- Full optimization history

	## Installation

	1. Clone the repository:
	```bash
	git clone https://github.com/kevin1kevin1k/WeavePrompt.git
	cd WeavePrompt
	```
	2. Install dependencies:
	```bash
	uv venv
	uv sync
	source .venv/bin/activate
	```
	3. Setup `.env`
	Put the following inside `.env`:
	- API keys `WANDB_API_KEY` and `FAL_KEY`
	- Weave project name `WEAVE_PROJECT`

	## Usage

	Run the demo app:
	```bash
	streamlit run src/app.py
	```

	Follow the instructions in the browser to upload an image and step through the optimization process.

	## Architecture Diagram

	![diagram](./diagram.png)


	## Outcome

	![outcome](./outcome.png)

	Use the same prompt as the standard model, the target model yields the similar (high quality) output as a result.

	## References
	- https://arxiv.org/abs/1801.03924 - The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
	- https://arxiv.org/abs/2510.06335 - Image Reconstruction from Highly Undersampled Data
	- https://arxiv.org/abs/2510.03191 - Product-Quantised Image Representation for High-Quality Image Synthesis