Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28 • 83
Learning Video Representations from Large Language Models Paper • 2212.04501 • Published Dec 8, 2022
Distilling Vision-Language Models on Millions of Videos Paper • 2401.06129 • Published Jan 11 • 15
LEAP: Liberate Sparse-view 3D Modeling from Camera Poses Paper • 2310.01410 • Published Oct 2, 2023 • 1
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20 • 22