Different from Video-LLaVA

by Yhyu13 - opened


Seem there is another work that also made by the PKU called Video-LLaVA https://huggingface.co/LanguageBind/Video-LLaVA-V1.5/tree/main

Although weight not release yet, but it seems surpass chat univi in all benchmarks


Some both methods added video unserstanding to LLM, and both methods can process video & image simulnatenously. It is just that ChatUniVi has invented another apoproach that is different from LLaVA, where Video-LLaVA has derived from? Is there anything that make these two methods fundamentally different?


Sign up or log in to comment