Are Vision-Language Models Truly Understanding Multi-vision Sensor? Paper • 2412.20750 • Published 15 days ago • 19
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models Paper • 2412.01822 • Published Dec 2, 2024 • 14
Phantom of Latent for Large Language and Vision Models Paper • 2409.14713 • Published Sep 23, 2024 • 28
SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models Paper • 2408.12114 • Published Aug 22, 2024 • 13
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models Paper • 2406.01920 • Published Jun 4, 2024 • 1
What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models Paper • 2403.13513 • Published Mar 20, 2024 • 1
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published May 24, 2024 • 53
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing Paper • 2306.14435 • Published Jun 26, 2023 • 20