MambaVision: A Hybrid Mamba-Transformer Vision Backbone Paper • 2407.08083 • Published Jul 10, 2024 • 32
Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models Paper • 2503.11224 • Published 11 days ago • 26
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 127
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15, 2024 • 175
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published Oct 21, 2024 • 68
Repeat After Me: Transformers are Better than State Space Models at Copying Paper • 2402.01032 • Published Feb 1, 2024 • 24
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published Jun 12, 2024 • 41