PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
Abstract
Recent research explores the potential of Diffusion Models (DMs) for consistent object editing, which aims to modify object position, size, and composition, etc., while preserving the consistency of objects and background without changing their texture and attributes. Current inference-time methods often rely on DDIM inversion, which inherently compromises efficiency and the achievable consistency of edited images. Recent methods also utilize energy guidance which iteratively updates the predicted noise and can drive the latents away from the original image, resulting in distortions. In this paper, we propose PixelMan, an inversion-free and training-free method for achieving consistent object editing via Pixel Manipulation and generation, where we directly create a duplicate copy of the source object at target location in the pixel space, and introduce an efficient sampling approach to iteratively harmonize the manipulated object into the target location and inpaint its original location, while ensuring image consistency by anchoring the edited image to be generated to the pixel-manipulated image as well as by introducing various consistency-preserving optimization techniques during inference. Experimental evaluations based on benchmark datasets as well as extensive visual comparisons show that in as few as 16 inference steps, PixelMan outperforms a range of state-of-the-art training-based and training-free methods (usually requiring 50 steps) on multiple consistent object editing tasks.
Community
Paper: https://arxiv.org/abs/2412.14283
Project Website: https://liyaojiang1998.github.io/projects/PixelMan/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Pathways on the Image Manifold: Image Editing via Video Generation (2024)
- 3D-Consistent Image Inpainting with Diffusion Models (2024)
- Re-Attentional Controllable Video Diffusion Editing (2024)
- Stable Flow: Vital Layers for Training-Free Image Editing (2024)
- SeedEdit: Align Image Re-Generation to Image Editing (2024)
- BrushEdit: All-In-One Image Inpainting and Editing (2024)
- Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Thank you for your interest in our work. The GitHub codebase is under construction and coming soon.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper