arxiv:2406.07008

Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models

Published on Oct 19, 2025

Authors:

Abstract

Pre-trained text-to-image diffusion models enable training-free appearance transfer by explicitly rearranging features based on dense semantic correspondences to preserve target structure while reflecting reference appearance.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

As pre-trained text-to-image diffusion models have become a useful tool for image synthesis, people want to specify the results in various ways. This paper tackles training-free appearance transfer, which produces an image with the structure of a target image from the appearance of a reference image. Existing methods usually do not reflect semantic correspondence, as they rely on query-key similarity within the self-attention layer to establish correspondences between images. To this end, we propose explicitly rearranging the features according to the dense semantic correspondences. Extensive experiments show the superiority of our method in various aspects: preserving the structure of the target and reflecting the correct color from the reference, even when the two images are not aligned.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.07008 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.07008 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.07008 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.