Chat-UniVi
/

Chat-UniVi

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Chat-UniVi commited on Nov 21, 2023

Commit

c89d507

•

1 Parent(s): b3b4609

Update README.md

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -1,6 +1,18 @@
 ---
 license: llama2
 ---
 ### Inference for Video Understanding
 ```python

 ---
 license: llama2
 ---
+## 😮 Highlights
+### 💡 Unified visual representation for image and video
+We employ **a set of dynamic visual tokens** to uniformly represent images and videos.
+This representation framework empowers the model to efficiently utilize **a limited number of visual tokens** to simultaneously capture **the spatial details necessary for images** and **the comprehensive temporal relationship required for videos**.
+### 🔥 Joint training strategy, making LLMs understand both image and video
+Chat-UniVi is trained on a mixed dataset containing both images and videos, allowing direct application to tasks involving both mediums without requiring any modifications.
+### 🤗 High performance, complementary learning with image and video
+Extensive experimental results demonstrate that Chat-UniVi, as a unified model, consistently outperforms even existing methods exclusively designed for either images or videos.
 ### Inference for Video Understanding
 ```python