TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models Paper • 2410.10818 • Published 22 days ago • 14
Attention Prompting on Image for Large Vision-Language Models Paper • 2409.17143 • Published Sep 25 • 7
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally Paper • 2409.08270 • Published Sep 12 • 9
Gated Slot Attention for Efficient Linear-Time Sequence Modeling Paper • 2409.07146 • Published Sep 11 • 19
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities Paper • 2408.00765 • Published Aug 1 • 12