Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines Paper • 2410.21220 • Published Oct 28, 2024 • 10
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations Paper • 2410.08049 • Published Oct 10, 2024 • 8
Explore the Limits of Omni-modal Pretraining at Scale Paper • 2406.09412 • Published Jun 13, 2024 • 10
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions Paper • 2402.03040 • Published Feb 5, 2024 • 17
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities Paper • 2401.14405 • Published Jan 25, 2024 • 12