Diffusion Single File
comfyui

Can anyone tell me the function and usage of LTX-2.3-OmniNFT-RL-Lora_bf16?

#61
by soxon - opened

Sorry, I don't really understand. I want to know the effect after using it and how to use it.

Google AI asked "what is OmniNFT-RL-Lora" says:

OmniNFT-RL-Lora refers to a cutting-edge AI research framework designed to improve how generative models simultaneously create audio and video. It uses reinforcement learning (RL) and Low-Rank Adaptation (LoRA) to fix alignment and synchronization problems in "Twin-DiT" (Diffusion Transformer) models like LTX-2.

https://www.google.com/search?client=opera&q=what+is+OmniNFT-RL-Lora&sourceid=opera&ie=UTF-8&oe=UTF-8

It's brand new so I don't have that much information.

The source is https://huggingface.co/zghhui/OmniNFT

tldr: it makes the model work better

Some examples here https://zghhui.github.io/OmniNFT/ (bottom of page)
Looks like its giving a bit of improvements

Not obvious in my tests,but worse when strength set to 1.

It's subtle.. but not an extensive test, just a few test runs... seems to be slightly more natural perhaps

Some examples here https://zghhui.github.io/OmniNFT/ (bottom of page)
Looks like its giving a bit of improvements

Most of the examples seems to shows a more accurate speakers, especially when there are multiple characters 🤔 with less background sound too.

But the example where the baseline is a photorealistic girl while the OmniNFT became anime girl feels strange 😅 may be it was trained more on anime/cartoon 🤔

An important note about something I initially missed: They have set alpha to 64 in their config entry, while the lora is rank 32... this means to get the intended default effect the lora strength should be 2.0 in ComfyUI.

Their adapter lora is 1.2GB while your lora is 617MB. What is the difference?

Their adapter lora is 1.2GB while your lora is 617MB. What is the difference?

He has downcast it from fp32 to fp16, it's half the size.

Sign up or log in to comment