Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published 28 days ago • 58
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22 • 15
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14 • 11
VideoMamba: State Space Model for Efficient Video Understanding Paper • 2403.06977 • Published Mar 11 • 21
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 233
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Paper • 2307.06942 • Published Jul 13, 2023 • 20