Chat-UniVi/Chat-UniVi-13B · Different from Video-LLaVA

Hi,

Seem there is another work that also made by the PKU called Video-LLaVA https://huggingface.co/LanguageBind/Video-LLaVA-V1.5/tree/main

Although weight not release yet, but it seems surpass chat univi in all benchmarks

https://github.com/PKU-YuanGroup/Video-LLaVA#video-understanding

Some both methods added video unserstanding to LLM, and both methods can process video & image simulnatenously. It is just that ChatUniVi has invented another apoproach that is different from LLaVA, where Video-LLaVA has derived from? Is there anything that make these two methods fundamentally different?

Thanks!