Matched the action but not the timing

#8
by wwgsteve - opened

This video is generated with AI, I was hoping that MMAudio would create the audio for it. And it did create audio of someone running, but the timing is wrong and its only one person. Am I doing something wrong?

Thanks for trying it out! Our models do have failure modes and failure cases. In this instance, it seems like the slow-motion video is confusing the model. The model has not seen enough slow-motion footage during training. On a separate note, it also does not do footsteps very well, again, probably due to training data limitations.
I tried another "running" example below. It still isn't great, but without the slow-motion and timing seems to be more accurate.

Thanks. Maybe I can generate the audio at full speed and then slow it down (slow the timing without the pitch shift - which is possible).

I get that too all the time, have to shift audio in video editor

I think the issue may be due to the fact that it seems that mmaudio is expecting a frame rate of 24 fps, if I'm saving my video at 16fps (or anything else), then there's going to be an issue with clipping the audio short.

Sign up or log in to comment