File size: 718 Bytes
9cfdaa9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# LongVA
<p align="center">
<img src="vision_niah/niah_output/LongVA-7B/heatmap.png" width="800">
</p>
<p align="center">
๐ <a href="https://lmms-lab.github.io/posts/longva/" target="_blank">Blog</a> | ๐ <a href="https://arxiv.org/abs/2406.16852" target="_blank">Paper</a> | ๐ค <a href="https://huggingface.co/collections/lmms-lab/longva-667538e09329dbc7ea498057" target="_blank">Hugging Face</a> | ๐ฅ <a href="https://longva-demo.lmms-lab.com/" target="_blank">Demo</a>
</p>
Long context capability can **zero-shot transfer** from language to vision.
LongVA can process **2000** frames or over **200K** visual tokens. It achieves **state-of-the-art** performance on Video-MME among 7B models.
|