SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 6 days ago • 155
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting Paper • 2504.05541 • Published 6 days ago • 13
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published 3 days ago • 16
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 1 day ago • 58
MegaWika: Millions of reports and their sources across 50 diverse languages Paper • 2307.07049 • Published Jul 13, 2023
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval Paper • 2410.11619 • Published Oct 15, 2024 • 1
MultiVENT and MAGMAR Resources Collection Resources associated with the MultiVENT datasets, MAGMAR workshop, and other video retrieval and multimodal retrieval augmented generation • 5 items • Updated 9 days ago • 1
MultiVENT and MAGMAR Resources Collection Resources associated with the MultiVENT datasets, MAGMAR workshop, and other video retrieval and multimodal retrieval augmented generation • 5 items • Updated 9 days ago • 1
MultiVENT and MAGMAR Resources Collection Resources associated with the MultiVENT datasets, MAGMAR workshop, and other video retrieval and multimodal retrieval augmented generation • 5 items • Updated 9 days ago • 1
WikiVideo: Article Generation from Multiple Videos Paper • 2504.00939 • Published 12 days ago • 36 • 3