On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published 7 days ago • 72
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published 7 days ago • 72
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published 27 days ago • 21
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published 27 days ago • 21
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31 • 76
Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark Paper • 2502.04976 • Published Feb 7
NUS-Emo at SemEval-2024 Task 3: Instruction-Tuning LLM for Multimodal Emotion-Cause Analysis in Conversations Paper • 2501.17261 • Published Aug 22, 2024
A Survey on Benchmarks of Multimodal Large Language Models Paper • 2408.08632 • Published Aug 16, 2024 • 2