Papers
arxiv:2405.11473

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Published on May 19
ยท Submitted by akhaliq on May 21
#1 Paper of the day
Authors:
,

Abstract

We propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Our approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without training. This is achieved by iteratively performing diagonal denoising, which concurrently processes a series of consecutive frames with increasing noise levels in a queue; our method dequeues a fully denoised frame at the head while enqueuing a new random noise frame at the tail. However, diagonal denoising is a double-edged sword as the frames near the tail can take advantage of cleaner ones by forward reference but such a strategy induces the discrepancy between training and inference. Hence, we introduce latent partitioning to reduce the training-inference gap and lookahead denoising to leverage the benefit of forward referencing. We have demonstrated the promising results and effectiveness of the proposed methods on existing text-to-video generation baselines.

Community

It would be awesome to have a demo available on the hub!๐Ÿ”ฅ

This comment has been hidden
Paper author

Thank you for the summary!
However, most of the technical and critical analysis does not consistent with our work.. It seems a bit strange..๐Ÿ˜…

ยท

Sometimes the AI makes mistakes like this - I will ask it to rerun it. Do you have some examples of feedback you would like me to incorporate?

Is it easy to adapt it to img2vid model like Dynamicrafter?

ยท
Paper author

Diagonal denoising strategy can easily be adapted to all kinds of video diffusion models, whether they are T2V, I2V, etc. However, we only focus on T2V models since it can be realized in a training-free manner ;)

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2405.11473 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2405.11473 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2405.11473 in a Space README.md to link it from this page.

Collections including this paper 12