StableV2V: Stablizing Shape Consistency in Video-to-Video Editing
Abstract
Recent advancements of generative AI have significantly promoted content creation and editing, where prevailing studies further extend this exciting progress to video editing. In doing so, these studies mainly transfer the inherent motion patterns from the source videos to the edited ones, where results with inferior consistency to user prompts are often observed, due to the lack of particular alignments between the delivered motions and edited contents. To address this limitation, we present a shape-consistent video editing method, namely StableV2V, in this paper. Our method decomposes the entire editing pipeline into several sequential procedures, where it edits the first video frame, then establishes an alignment between the delivered motions and user prompts, and eventually propagates the edited contents to all other frames based on such alignment. Furthermore, we curate a testing benchmark, namely DAVIS-Edit, for a comprehensive evaluation of video editing, considering various types of prompts and difficulties. Experimental results and analyses illustrate the outperforming performance, visual consistency, and inference efficiency of our method compared to existing state-of-the-art studies.
Community
We are excited to release our video editing paper titled "StableV2V: Stablizing Shape Consistency in Video-to-Video Editing", where the work mainly focuses on solving editing scenarios that user prompts tend to cause shape differences. Besides, we curate a testing benchmark, namely DAVIS-Edit, to offer a evaluation standard of both text- and image-based video editing. Code, model weights, and DAVIS-Edit are currently open-sourced at our GitHub and HuggingFace repos.
Project Page: https://alonzoleeeooo.github.io/StableV2V/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing (2024)
- Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models (2024)
- Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing (2024)
- TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control (2024)
- Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Thanks for the information!
Models citing this paper 1
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper