VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval Paper • 2406.04292 • Published Jun 6 • 1
MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding Paper • 2406.04264 • Published Jun 6 • 1