The generation process takes too long; I suggest optimizing the speed in the next version.

#29
by zh277 - opened

When using the official model, generating a 15-second, 720p video takes only 3 minutes—a process that includes both initial sampling and upscaling. The quality of the resulting video depends on the specific combination of LoRA weights and the precision of the prompts used.
However, when I integrate your model into my own workflow—employing an 8-step sampling process with the exact same prompts and source images (having tested it both with and without LoRA)—the results are consistently poor, almost invariably exhibiting issues where limbs "clip" through the body. While utilizing the specific workflow you provided does yield videos of decent quality, the time cost is prohibitively high; it takes a grueling 10 minutes to complete a single 15-second video. Furthermore, even within your provided workflow, using an 8-step sampling setting still results in the same limb-clipping artifacts.
I sincerely hope that you can optimize the generation speed in the next version. I would be grateful if you would take this feedback into consideration, and I thank you for all your hard work.

This is a problem on your end you're not finding and dismissing falsely. Or you're comparing a pre-distilled base model v.s. having to use this with extra distilled lora on a system thats running right at it's limit. Any time you have multiplte minute jumps in time you have run out of both memory and VRAM and are offloading and running on disk buffer, this is a personal system and use problem I can't "make this model run faster"

Sign up or log in to comment