UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface Paper • 2503.01342 • Published 6 days ago • 7
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding Paper • 2412.00493 • Published Nov 30, 2024 • 17
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3, 2024 • 69
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14, 2024 • 26