arxiv:2407.01494

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

Published on Jul 1

· Submitted by

zengyh1900 on Jul 3

Upvote

Authors:

Yiming Zhang ,

Yanhong Zeng ,

Zhening Xing ,

Yuancheng Wang ,

Zhizheng Wu ,

Abstract

We study Neural Foley, the automatic generation of high-quality sound effects synchronizing with videos, enabling an immersive audio-visual experience. Despite its wide range of applications, existing approaches encounter limitations when it comes to simultaneously synthesizing high-quality and video-aligned (i.e.,, semantic relevant and temporal synchronized) sounds. To overcome these limitations, we propose FoleyCrafter, a novel framework that leverages a pre-trained text-to-audio model to ensure high-quality audio generation. FoleyCrafter comprises two key components: the semantic adapter for semantic alignment and the temporal controller for precise audio-video synchronization. The semantic adapter utilizes parallel cross-attention layers to condition audio generation on video features, producing realistic sound effects that are semantically relevant to the visual content. Meanwhile, the temporal controller incorporates an onset detector and a timestampbased adapter to achieve precise audio-video alignment. One notable advantage of FoleyCrafter is its compatibility with text prompts, enabling the use of text descriptions to achieve controllable and diverse video-to-audio generation according to user intents. We conduct extensive quantitative and qualitative experiments on standard benchmarks to verify the effectiveness of FoleyCrafter. Models and codes are available at https://github.com/open-mmlab/FoleyCrafter.

View arXiv page View PDF Add to collection

Community

zengyh1900

Paper author Paper submitter 23 days ago

Project page: foleyCrafter

AdinaY

23 days ago

Congrats on your work @ymzhang319 🔥 And thanks for linking the Space to the paper.
It would be awesome to link the model to the paper page as well. : )

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2407.01494 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.01494 in a dataset README.md to link it from this page.

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 2

Collections including this paper 5