some oddities with your new extend video workflow

#152

by BallisticAI - opened 4 days ago

I just tried it out and generally it works well, but I'm stuck on a few things. I don't know if you're still working on them or not.

I'm sometimes get color shifting in the original extension, and I cant seem to find how to enable color matching in those nodes. Is it hidden behind a node with the slew of get/set nodes you use?
Example:

Input video:
the original extension keeps the same exact vocal type and sound of the original video (about 50% of the time), but the extend videos more than not seem to change entirely. Other times its the exact opposite. The original extension changes the voice but the rest are close to consistent. Are you planning to integrate the ID LORA bits to keep the voice consistent?
Occassionally in the first extended extension, the background will be entirely swapped out with some swipe transition effect, or given what looks like the end of a commercial segment. Ive never gotten this before with any LTX generation. Is there a way to prompt to avoid this?

Same input video as above

Extend Attempt 1:

Extend Attempt 2 (without color matching):

Im not worried about captions, I know that's an LTX thing for vertical video and not related to your workflow.

RuneXX

Owner 4 days ago

•

edited 4 days ago

Yes the extended video has its weak points.

it uses 73 frames as reference frames (video and audio). That is the last 3 seconds of previous video.

Anything not happening in those last 3 seconds, the next video part has no idea about. So if the voice is not part of the last 3 seconds, its likely to change in next part.
You can set the reference frames higher though. 73 is the minimum recommended by LTX. (and i put that in the wf, since many will probably try extending short 5s videos)

The color drift/changes are strongest in 2-pass workflow. Since it first generate at low resolution version and then runs it again through a 2nd pass upscale.
This introduces new details (often better) than the original, but since it differs from the original it might be quite noticeable

The workflow works best with single pass (the workflow has a single pass / 2pass mode toggle)
With 2-pass, it tries to blend the frames in an overlap so the changes are a little less noticeable
With 2-pass there is also a color match toggle at each group that can help color shifts a little bit (only use this with 2-pass, at least in theory single pass shouldn't need it)

That being said, the color match is set really low to be subtle (0.25 if i remember). You can peek inside the subgraph and set the color match stronger (its near the top right of the subgraph wf).

And for any strangeness (that will happen, 3 seconds and the model has to "guess" what to do next... sort of), the best fix is often just changing the seed and prompt.
If the output was not desirable.

RuneXX

Owner 4 days ago

•

edited 4 days ago

That all being said, its the newest workflow, will take a look and see if it can be made to work even better. Maybe some latent guiding nodes or something

Also color matching node is "dumb" so it takes the whole image. In your examples the green is very strong (background). It will make everything a shade of green likely
So it might not be a node for that particular video at least, and best be turned off

RuneXX

Owner 4 days ago

•

edited 4 days ago

A little test run with no color matching turned on, and single pass... seems to hold up pretty well. Ideally this type of workflows should run at single-pass

But its the 2 pass workflow that's probably the challenge, as 2 samplers introduce new details (and colors).
Will test a bit with 2-pass if it can be made to be more smooth

Portland01

3 days ago

•

edited 3 days ago

Yep, second pass ruins it. Keep it as single. Also best to use a video editor when using this workflow.

Edit and cut the last 5 seconds of the recently created video and continue from that for a new generated video. Merge/Join a 5 second talking part from the original video and add it at the front of that newly cut 5 second video beforehand to keep the voice. Once that's done, edit the new longer video and remove the 5 second talking part at the beginning and merge the rest with the first edited video.

Can all be done easily through a freeware program like avidemux. It doesn't take long. Seems to be the only way as far as I'm aware to make really long videos without loosing character or voice.

Stick with the single extend workflow. Don't use the multi extend. Unfortunately that one works horribly and will mess up the characters and audio.

Also use the new OmniNFT lora and the Licon-VBVR-I2V-Video Reasoning390K-R32 lora and have them set at 1 strength. These are needed in order to actually follow the new prompt. I found not using them in this workflow makes the character just stand there like an idiot and not do anything.

Also, make certain the reference setting in this workflow at the bottom is not set higher then the added video or it will not transition properly. Setting it higher then the video length will cause it to skip/cut scene. To keep a smooth transition between the old and new, set it 1 second below your added video.

Something else to keep in mind, the lower your reference number is, the faster the generation will be. Useful if the talking takes place near the end of the video. You don't need to make the reference as long if that is the case.

RuneXX

Owner 3 days ago

Works great in single-pass and thats really the mode for such a wf.
Its nice for something quick and easy ...

For more serious video editing, using external editor, and doing one by one extension is better.
That way you can do each extended part over and over until you get the one you like... and keep going from there

BallisticAI

2 days ago

Thanks for all the comments, I've been busy with other things so I havent been able to refocus on this, but I should have some time this upcoming week. :)

eatmemark

1 day ago

Yep, second pass ruins it. Keep it as single. Also best to use a video editor when using this workflow.

Edit and cut the last 5 seconds of the recently created video and continue from that for a new generated video. Merge/Join a 5 second talking part from the original video and add it at the front of that newly cut 5 second video beforehand to keep the voice. Once that's done, edit the new longer video and remove the 5 second talking part at the beginning and merge the rest with the first edited video.

Good advice for sure when it comes to keeping voice and motion consistency, but how are you dealing with frame jumps on the final merge? Extending the same video twice with this method and then splicing them together won't be seamless, as the last frame never aligns perfectly with the next starting frame.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment