V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper ⢠2504.06148 ⢠Published 10 days ago ⢠12
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models Paper ⢠2503.20198 ⢠Published 24 days ago ⢠4
VideoLLM-online: Online Video Large Language Model for Streaming Video Paper ⢠2406.11816 ⢠Published Jun 17, 2024 ⢠25
UniVTG: Towards Unified Video-Language Temporal Grounding Paper ⢠2307.16715 ⢠Published Jul 31, 2023 ⢠11