Chat-UniVi commited on
Commit
c89d507
•
1 Parent(s): b3b4609

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -1,6 +1,18 @@
1
  ---
2
  license: llama2
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  ### Inference for Video Understanding
6
  ```python
 
1
  ---
2
  license: llama2
3
  ---
4
+ ## 😮 Highlights
5
+
6
+ ### 💡 Unified visual representation for image and video
7
+ We employ **a set of dynamic visual tokens** to uniformly represent images and videos.
8
+ This representation framework empowers the model to efficiently utilize **a limited number of visual tokens** to simultaneously capture **the spatial details necessary for images** and **the comprehensive temporal relationship required for videos**.
9
+
10
+ ### 🔥 Joint training strategy, making LLMs understand both image and video
11
+ Chat-UniVi is trained on a mixed dataset containing both images and videos, allowing direct application to tasks involving both mediums without requiring any modifications.
12
+
13
+ ### 🤗 High performance, complementary learning with image and video
14
+ Extensive experimental results demonstrate that Chat-UniVi, as a unified model, consistently outperforms even existing methods exclusively designed for either images or videos.
15
+
16
 
17
  ### Inference for Video Understanding
18
  ```python